\name{subt} \alias{subt} %- Also NEED an '\alias' for EACH other topic documented here. \title{ Subsampling a Microarray Data Set for Estimating Proportion of True Null Hypotheses } \description{ This function subsamples the columns (arrays) of a microarray data set and do two-sample t-tests. Subsamples from each treatment group are obtained and combined. A t-test is conducted for each row (gene) of the subsampled data set and the p-value density at one is estimated for each combined subsample. } \usage{ subt(dat, n1 = round(ncol(dat)/2), n2 = ncol(dat) - n1, f1method = c("lastbin", "qvalue"), max.reps = if(balanced)20 else 5, balanced = FALSE, ...) } %- maybe also 'usage' for other objects documented here. \arguments{ \item{dat}{ a numeric matrix, the microarray data set with each row being a gene, and each column being a subject. The first \code{n1} columns correspond to treatment group 1 and the rest \code{n2} columns correspond to treatment group 2. } \item{n1}{ a positive integer, the original sample size in treatment group 1. } \item{n2}{ a positive integer, the original sample size in treatment group 2. } \item{f1method}{ character, the name of the function to be used to estimate the p-value density at 1. The first argument of the function needs to be a vector of values. } \item{max.reps}{ a positive integer, the maximum number of subsamples to obtain per subsample size configuration. If this is set to \code{Inf}, then all possible subsamples will be tried. However, see Notes and the \code{R} argument of \code{\link{combn2R}}. } \item{balanced}{ logical, indicating whether only balanced subsamples are obtained. This is computationally faster and is good for initial exploration purposes. } % \item{totalOnly}{logical. If \code{TRUE}, then only the total number of subsamples are returned, without % doing any actual subsampling. This is used only to see if the computational burden look % reasonable or not before doing any actual computations. } \item{\dots}{ additional arguments used by \code{f1method}. } } \details{ This function tries to get possible subsamples through \code{\link{combn2R}}. \cr For each total subsample size M=3,4,...,N, where N=n1+n2, do the following, \itemize{ \item{1}{For each treatment 1 subsample size m1=1,2,...,n1, let m2=M-m1. If 1<=m2<=n2 and at least one of \code{balanced} and m1=m2 is true, then do the following, \itemize{ \item{1.1}{Randomly choose \code{max.reps} subsamples among all possible subsamples by choosing m1 subjects from treatment group 1 and m2 subjects from treatment group 2, by using the function \code{\link{combn2R}} with \code{sample.method="diff2"} and \code{try.rest=TURE}. Note that this may \emph{not} be always possible due to some pratical computational limitations. See \code{\link{combn2R}} for details.} \item{1.2}{For each subsample obtained in \code{1.1}, (1) do a t-test for each gene (i.e., each row of the subsample), and (2) estimate the p-value density at one.} } } } } \value{ %If \code{totalOnly=TRUE}, only a positive integer is returned, giving the total number of subsamples;\cr %otherwise, an object of class \code{c("subt","matrix")}, which is a G-by-3 numeric matrix, where G is \code{nrow{dat}}, with column names 'f1', 'n1', and 'n2', corresponding to the p-value density at 1 and subsample size in each treatment group. This object also has the following \code{\link{attributes}}, \item{n1}{the same as the argument \code{n1}.} \item{n2}{the same as the argument \code{n2}.} \item{f1method}{the same as the argument \code{f1method}.} \item{max.reps}{the same as the argument \code{max.reps}.} \item{balanced}{the same as the argument \code{balanced}.} } \references{ Qu, L., Nettleton, D., Dekkers, J.C.M. Subsampling Based Bias Reduction in Estimating the Proportion of Differentially Expressed Genes from Microarray Data. Unpublished manuscript. } \author{ Long Qu } \note{ \code{max.reps} applies to each subsample size configuration. For example, 2 subjects subsampled from treatment group1 and 3 subjects subsampled from treatment group 2 will be considered as a different subsample size configuration than 3 subjects subsampled from treatment group 1 and 2 subjects subsampled from treatment group 2. For the small sample sizes commonly seen in microarray data, a large \code{max.reps} is rarely a big computational burden. But be careful when you do have a very large sample size, as the number of all possible subsamples grows very fast. } \seealso{\code{\link{print.subt}}, \code{\link{plot.subt}}, \code{\link{extrp.pi0}}, \code{\link{matrix.t.test}},\code{\link{combn2R}}, \code{\link{subex}}, \code{\link{lastbin}}, \code{\link[qvalue]{qvalue}} } \examples{ \dontrun{ set.seed(9992722) ## this is how the 'simulatedDat' data set in this package generated simulatedDat=sim.dat(G=5000) ## this is how the 'simulatedSubt' object in this package generated simulatedSubt=subt(simulatedDat,balanced=FALSE,max.reps=Inf) } data(simulatedSubt) print(simulatedSubt) } % Add one or more standard keywords, see file 'KEYWORDS' in the % R documentation directory. \keyword{ htest } \keyword{ multivariate }% __ONLY ONE__ keyword per line \keyword{ nonparametric }% __ONLY ONE__ keyword per line \keyword{iteration}