A comparison of random forests, boosting and support vector machines for genomic selection

doi:10.1186/1753-6561-5-S3-S11

Open AccessJournal ArticleDOI

A comparison of random forests, boosting and support vector machines for genomic selection

Joseph O. Ogutu, +2 more

- 27 May 2011 -

BMC Proceedings

- Vol. 5, Iss: 3, pp 1-5

TLDR

The predictive accuracy of random forests, stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers was evaluated and the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs was explored.

Abstract:

Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative predictive performances to identify approaches able to accurately predict breeding values. We evaluated the predictive accuracy of random forests (RF), stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers and explored the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs. We predicted GEBVs for one quantitative trait in a dataset simulated for the QTLMAS 2010 workshop. Predictive accuracy was measured as the Pearson correlation between GEBVs and observed values using 5-fold cross-validation and between predicted and true breeding values. The importance of each marker was ranked using RF and plotted against the position of the marker and associated QTLs on one of five simulated chromosomes. The correlations between the predicted and true breeding values were 0.547 for boosting, 0.497 for SVMs, and 0.483 for RF, indicating better performance for boosting than for SVMs and RF. Accuracy was highest for boosting, intermediate for SVMs and lowest for RF but differed little among the three methods and relative to ridge regression BLUP (RR-BLUP).

A comparison of random forests, boosting and support vector machines for genomic selection

Citations

M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity

Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions

Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions

Gradient boosting machine for modeling the energy consumption of commercial buildings

Landsat-8 vs. Sentinel-2: examining the added value of sentinel-2’s red-edge bands to land-use and land-cover mapping in Burkina Faso

References

Random Forests

Pattern Recognition and Machine Learning

The Elements of Statistical Learning

Pattern Recognition and Machine Learning

Classification and Regression by randomForest

Related Papers (5)

Random Forests

Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps

Greedy function approximation: A gradient boosting machine.

Stochastic gradient boosting

Classification and Regression by randomForest