scispace - formally typeset
Open AccessJournal ArticleDOI

Genetic Analysis Workshop 17 mini-exome simulation.

TLDR
The data set simulated for Genetic Analysis Workshop 17 was designed to mimic a subset of data that might be produced in a full exome screen for a complex disorder and related risk factors in order to permit workshop participants to investigate issues of study design and statistical genetic analysis.
Abstract
The data set simulated for Genetic Analysis Workshop 17 was designed to mimic a subset of data that might be produced in a full exome screen for a complex disorder and related risk factors in order to permit workshop participants to investigate issues of study design and statistical genetic analysis. Real sequence data from the 1000 Genomes Project formed the basis for simulating a common disease trait with a prevalence of 30% and three related quantitative risk factors in a sample of 697 unrelated individuals and a second sample of 697 individuals in large, extended pedigrees. Called genotypes for 24,487 autosomal markers assigned to 3,205 genes and simulated affection status, quantitative traits, age, sex, pedigree relationships, and cigarette smoking were provided to workshop participants. The simulating model included both common and rare variants with minor allele frequencies ranging from 0.07% to 25.8% and a wide range of effect sizes for these variants. Genotype-smoking interaction effects were included for variants in one gene. Functional variants were concentrated in genes selected from specific biological pathways and were selected on the basis of the predicted deleteriousness of the coding change. For each sample, unrelated individuals and family, 200 replicates of the phenotypes were simulated.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Powerful and Adaptive Association Test for Rare Variants

TL;DR: An adaptive SPU (aSPU) test is proposed to approximate the most powerful SPU test for a given scenario, consequently maintaining high power and being highly adaptive across various scenarios.
Journal ArticleDOI

Brief review of regression‐based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience

TL;DR: A brief review of the machine learning and regression‐based methods used in the analyses of common and rare genetic variants from exome sequencing data and simulated binary and quantitative traits in 200 replicates is provided.
Journal ArticleDOI

Robust and Powerful Tests for Rare Variants Using Fisher's Method to Combine Evidence of Association From Two or More Complementary Tests

TL;DR: Fisher's method consistently outperforms the minimum‐p and the individual linear and quadratic tests, as well as the optimal sequence kernel association test, SKAT‐O, and is robust across models with varying proportions of causal, deleterious, and protective rare variants, allele frequencies, and effect sizes.
Journal ArticleDOI

The group exponential lasso for bi-level variable selection.

TL;DR: This work proposes a new approach to penalized regression called the group exponential lasso (GEL) which features a decay parameter controlling the degree to which feature selection is coupled together within groups.
Journal ArticleDOI

Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results

TL;DR: In this article, the authors present a review of the performance of a wide range of test strategies to assess association between a group of rare variants and a trait, with competing claims about their performance.
References
More filters
Journal ArticleDOI

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Journal ArticleDOI

A Map of Human Genome Variation From Population-Scale Sequencing

TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Journal ArticleDOI

Chromosome-based method for rapid computer simulation in human genetic linkage analysis

TL;DR: It is proposed that by simulating pedigree data using a crossover formation (CF) process, one can generate simulated multilocus data for any number of loci on a chromosome much more efficiently than with the currently available methods like those used in the SLINK or SIMLINK programs.
Journal ArticleDOI

GAW12: Simulated genome scan, sequence, and family data for a common disease

TL;DR: The Genetic Analysis Workshop (GAW) 12 simulated data involves a common disease defined by imposing a threshold on a quantitative liability distribution Associated with the disease are five quantitative risk factors, a quantitative environmental exposure, and a dichotomous environmental variable.
Related Papers (5)