•Journal•ISSN: 1553-7390

PLOS Genetics

Public Library of Science

About: PLOS Genetics is an academic journal published by Public Library of Science. The journal publishes majorly in the area(s): Gene & Regulation of gene expression. It has an ISSN identifier of 1553-7390. It is also open access. Over the lifetime, 9738 publications have been published receiving 722979 citations. The journal is also known as: PLoS Genetics.

...read moreread less

Topics: Gene, Regulation of gene expression, Population, Biology, Medicine ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Population structure and eigenanalysis

[...]

Nick Patterson¹, Alkes L. Price², Alkes L. Price¹, David Reich¹, David Reich² - Show less +1 more•Institutions (2)

Broad Institute¹, Harvard University²

22 Dec 2006-PLOS Genetics

TL;DR: An approach to studying population structure (principal components analysis) is discussed that was first applied to genetic data by Cavalli-Sforza and colleagues, and results from modern statistics are used to develop formal significance tests for population differentiation.

...read moreread less

Abstract: Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general “phase change” phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.

...read moreread less

4,456 citations

Journal Article•DOI•

A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.

[...]

Bryan Howie¹, Peter Donnelly¹, Peter Donnelly², Jonathan Marchini¹•Institutions (2)

University of Oxford¹, Wellcome Trust Centre for Human Genetics²

19 Jun 2009-PLOS Genetics

TL;DR: It is found that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%–20% lower than those of the closest competing method.

...read moreread less

Abstract: Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%–20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.

...read moreread less

3,902 citations

Journal Article•DOI•

Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data

[...]

Joseph K. Pickrell¹, Jonathan K. Pritchard¹, Jonathan K. Pritchard²•Institutions (2)

University of Chicago¹, Howard Hughes Medical Institute²

15 Nov 2012-PLOS Genetics

TL;DR: A statistical model for inferring the patterns of population splits and mixtures in multiple populations and it is shown that a simple bifurcating tree does not fully describe the data; in contrast, many migration events are inferred.

...read moreread less

Abstract: Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and “ancient” Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.

...read moreread less

1,881 citations

Journal Article•DOI•

Capturing heterogeneity in gene expression studies by surrogate variable analysis.

[...]

Jeffrey T. Leek¹, John D. Storey¹•Institutions (1)

University of Washington¹

01 Jan 2005-PLOS Genetics

TL;DR: This work introduces “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies and shows that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.

...read moreread less

Abstract: It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.

...read moreread less

1,779 citations

Journal Article•DOI•

Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.

[...]

Claudia Giambartolomei¹, Damjan Vukcevic², Eric E. Schadt³, Lude Franke⁴, Aroon D. Hingorani¹, Chris Wallace⁵, Vincent Plagnol¹ - Show less +3 more•Institutions (5)

University College London¹, Royal Children's Hospital², Icahn School of Medicine at Mount Sinai³, University Medical Center Groningen⁴, University of Cambridge⁵

15 May 2014-PLOS Genetics

TL;DR: A novel statistical methodology to assess whether two association signals are consistent with a shared causal variant and the ability to derive the output statistics from single SNP summary statistics, making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets is developed.

...read moreread less

Abstract: Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.

...read moreread less

1,711 citations

Collapse

Performance

Metrics

9,917

Papers

723,546

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	182
2022	523
2021	540
2020	558
2019	549
2018	598