Capturing heterogeneity in gene expression studies by surrogate variable analysis.
Jeffrey T. Leek,John D. Storey +1 more
TLDR
This work introduces “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies and shows that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.Abstract:
It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.read more
Citations
More filters
Journal ArticleDOI
The sva package for removing batch effects and other unwanted variation in high-throughput experiments
TL;DR: The sva package is described, which supports surrogate variable estimation with the sva function, direct adjustment for known batch effects with the ComBat function and adjustment for batch and latent variables in prediction problems with the fsva function.
Journal ArticleDOI
Genetic effects on gene expression across human tissues.
Enhancing GTEx (eGTEx) groups,Nih Common Fund,Nhgri,Biospecimen Core Resource—VARI,Elsi study,Genome Browser Data Integration Visualization—EBI,Lead analysts,Alexis Battle,Christopher D. Brown,Barbara E. Engelhardt,Stephen B. Montgomery +10 more
TL;DR: It is found that local genetic variation affects gene expression levels for the majority of genes, and inter-chromosomal genetic effects for 93 genes and 112 loci are identified, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
Journal ArticleDOI
Fast, sensitive and accurate integration of single-cell data with Harmony.
Ilya Korsunsky,Nghia Millard,Jean Fan,Kamil Slowikowski,Fan Zhang,Kevin Wei,Yuriy Baglaenko,Michael B. Brenner,Po-Ru Loh,Po-Ru Loh,Po-Ru Loh,Soumya Raychaudhuri +11 more
TL;DR: Harmony, for the integration of single-cell transcriptomic data, identifies broad and fine-grained populations, scales to large datasets, and can integrate sequencing- and imaging-based data.
Journal ArticleDOI
DNA methylation arrays as surrogate measures of cell mixture distribution
Eugene Andres Houseman,William P. Accomando,Devin C. Koestler,Brock C. Christensen,Carmen J. Marsit,Heather H. Nelson,John K. Wiencke,Karl T. Kelsey +7 more
TL;DR: This work presents a method, similar to regression calibration, for inferring changes in the distribution of white blood cells between different subpopulations using DNA methylation signatures, in combination with a previously obtained external validation set consisting of signatures from purified leukocyte samples.
Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling
TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
References
More filters
Journal ArticleDOI
Cluster analysis and display of genome-wide expression patterns
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI
Principal components analysis corrects for stratification in genome-wide association studies
Alkes L. Price,Alkes L. Price,Nick Patterson,Robert M. Plenge,Robert M. Plenge,Michael E. Weinblatt,Nancy A. Shadick,David Reich,David Reich +8 more
TL;DR: This work describes a method that enables explicit detection and correction of population stratification on a genome-wide scale and uses principal components analysis to explicitly model ancestry differences between cases and controls.
Journal ArticleDOI
The control of the false discovery rate in multiple testing under dependency
Yoav Benjamini,Daniel Yekutieli +1 more
TL;DR: In this paper, it was shown that a simple FDR controlling procedure for independent test statistics can also control the false discovery rate when test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses.
Journal ArticleDOI
Statistical significance for genomewide studies
John D. Storey,Robert Tibshirani +1 more
TL;DR: This work proposes an approach to measuring statistical significance in genomewide studies based on the concept of the false discovery rate, which offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted.
Related Papers (5)
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Yoav Benjamini,Yosef Hochberg +1 more