scispace - formally typeset
Open AccessJournal ArticleDOI

A comparison of random forests, boosting and support vector machines for genomic selection

TLDR
The predictive accuracy of random forests, stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers was evaluated and the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs was explored.
Abstract
Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative predictive performances to identify approaches able to accurately predict breeding values. We evaluated the predictive accuracy of random forests (RF), stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers and explored the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs. We predicted GEBVs for one quantitative trait in a dataset simulated for the QTLMAS 2010 workshop. Predictive accuracy was measured as the Pearson correlation between GEBVs and observed values using 5-fold cross-validation and between predicted and true breeding values. The importance of each marker was ranked using RF and plotted against the position of the marker and associated QTLs on one of five simulated chromosomes. The correlations between the predicted and true breeding values were 0.547 for boosting, 0.497 for SVMs, and 0.483 for RF, indicating better performance for boosting than for SVMs and RF. Accuracy was highest for boosting, intermediate for SVMs and lowest for RF but differed little among the three methods and relative to ridge regression BLUP (RR-BLUP).

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity

TL;DR: M-CAP is developed, a clinical pathogenicity classifier that outperforms existing methods at all thresholds and correctly dismisses 60% of rare, missense variants of uncertain significance in a typical genome at 95% sensitivity.
Journal ArticleDOI

Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions

TL;DR: A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses, enabling the modeling of quantitative trait loci by environment interaction (Q*E), on a genome-wide scale.
Journal ArticleDOI

Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions

TL;DR: The elastic net, lasso, adaptive lasso and the adaptive elastic net all had similar accuracies but outperformed ridge regression and ridge regression BLUP in terms of the Pearson correlation between predicted GEBVs and the true genomic value as well as the root mean squared error.
Journal ArticleDOI

Gradient boosting machine for modeling the energy consumption of commercial buildings

TL;DR: The results show that using the gradient boosting machine model improved the R‐squared prediction accuracy and the CV(RMSE) in more than 80 percent of the cases, when compared to an industry best practice model that is based on piecewise linear regression, and to a random forest algorithm.
Journal ArticleDOI

Landsat-8 vs. Sentinel-2: examining the added value of sentinel-2’s red-edge bands to land-use and land-cover mapping in Burkina Faso

TL;DR: The availability of freely available moderate-to-high spatial resolution (10-30m) satellite imagery received a major boost with the recent launch of the Sentinel-2 sensor by the European Space Age as discussed by the authors.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Book

Pattern Recognition and Machine Learning

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Journal ArticleDOI

Pattern Recognition and Machine Learning

Radford M. Neal
- 01 Aug 2007 - 
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.

Classification and Regression by randomForest

TL;DR: random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler.