Background: Until recently, genome-wide association studies (GWAS) have been restricted to research groups withthe budget necessary to genotype hundreds, if not thousands, of samples. Replacing individual genotyping withgenotyping of DNA pools in Phase I of a GWAS has proven successful, and dramatically altered the financialfeasibility of this approach. When conducting a pool-based GWAS, how well SNP allele frequency is estimated froma DNA pool will influence a study’s power to detect associations. Here we address how to control the variance inallele frequency estimation when DNAs are pooled, and how to plan and conduct the most efficient well-poweredpool-based GWAS.Methods: By examining the variation in allele frequency estimation on SNP arrays between and within DNA poolswe determine how array variance [var(earray)] and pool-construction variance [var(econstruction)] contribute to thetotal variance of allele frequency estimation. This information is useful in deciding whether replicate arrays orreplicate pools are most useful in reducing variance. Our analysis is based on 27 DNA pools ranging in size from74 to 446 individual samples, genotyped on a collective total of 128 Illumina beadarrays: 24 1M-Single, 32 1M-Duo,and 72 660-Quad.Results: For all three Illumina SNP array types our estimates of var(earray) were similar, between 3-4 × 10-4 fornormalized data. Var(econstruction) accounted for between 20-40% of pooling variance across 27 pools in normalizeddata.Conclusions: We conclude that relative to var(earray), var(econstruction) is of less importance in reducing the variancein allele frequency estimation from DNA pools; however, our data suggests that on average it may be moreimportant than previously thought. We have prepared a simple online tool, PoolingPlanner (available at http://www.kchew.ca/PoolingPlanner/), which calculates the effective sample size (ESS) of a DNA pool given a range ofreplicate array values. ESS can be used in a power calculator to perform pool-adjusted calculations. This allows oneto quickly calculate the loss of power associated with a pooling experiment to make an informed decision onwhether a pool-based GWAS is worth pursuing.
Earp et al. BMC Medical Genomics 2011, 4:81
BMC Medical Genomics
Estimates of Array and Pool-Construction Variance for Planning Efficient DNA-Pooling Genome Wide Association Studies
Copyright is held by the author(s).
You are free to copy, distribute and transmit this work under the following conditions: You must give attribution to the work (but not in any way that suggests that the author endorses you or your use of the work); You may not use this work for commercial purposes.
Member of collection