scispace - formally typeset
Open AccessJournal ArticleDOI

Capturing heterogeneity in gene expression studies by surrogate variable analysis.

Jeffrey T. Leek, +1 more
- 01 Jan 2005 - 
- Vol. 3, Iss: 9, pp 1724-1735
TLDR
This work introduces “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies and shows that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.
Abstract
It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

The sva package for removing batch effects and other unwanted variation in high-throughput experiments

TL;DR: The sva package is described, which supports surrogate variable estimation with the sva function, direct adjustment for known batch effects with the ComBat function and adjustment for batch and latent variables in prediction problems with the fsva function.
Journal ArticleDOI

Genetic effects on gene expression across human tissues.

TL;DR: It is found that local genetic variation affects gene expression levels for the majority of genes, and inter-chromosomal genetic effects for 93 genes and 112 loci are identified, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
Journal ArticleDOI

Fast, sensitive and accurate integration of single-cell data with Harmony.

TL;DR: Harmony, for the integration of single-cell transcriptomic data, identifies broad and fine-grained populations, scales to large datasets, and can integrate sequencing- and imaging-based data.
Journal ArticleDOI

DNA methylation arrays as surrogate measures of cell mixture distribution

TL;DR: This work presents a method, similar to regression calibration, for inferring changes in the distribution of white blood cells between different subpopulations using DNA methylation signatures, in combination with a previously obtained external validation set consisting of signatures from purified leukocyte samples.

Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling

TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
References
More filters
Journal ArticleDOI

Cluster analysis and display of genome-wide expression patterns

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI

Generalized Additive Models.

Journal ArticleDOI

Principal components analysis corrects for stratification in genome-wide association studies

TL;DR: This work describes a method that enables explicit detection and correction of population stratification on a genome-wide scale and uses principal components analysis to explicitly model ancestry differences between cases and controls.
Journal ArticleDOI

The control of the false discovery rate in multiple testing under dependency

TL;DR: In this paper, it was shown that a simple FDR controlling procedure for independent test statistics can also control the false discovery rate when test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses.
Journal ArticleDOI

Statistical significance for genomewide studies

TL;DR: This work proposes an approach to measuring statistical significance in genomewide studies based on the concept of the false discovery rate, which offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted.
Related Papers (5)