Novel methods improve prediction of species' distributions from occurrence data
Jane Elith,Catherine H. Graham,Robert P. Anderson,Miroslav Dudík,Simon Ferrier,Antoine Guisan,Robert J. Hijmans,Falk Huettmann,John R. Leathwick,Anthony Lehmann,Jin Li,Lúcia G. Lohmann,Bette A. Loiselle,Glenn Manion,Craig Moritz,Miguel Nakamura,Yoshinori Nakazawa,Jacob C. M. Mc Overton,A. Townsend Peterson,Steven J. Phillips,Karen Richardson,Ricardo Scachetti-Pereira,Robert E. Schapire,Jorge Soberón,Stephen E. Williams,Mary S. Wisz,Niklaus E. Zimmermann +26 more
TLDR
This work compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date and found that presence-only data were effective for modelling species' distributions for many species and regions.Abstract:
Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.read more
Citations
More filters
Journal ArticleDOI
The global distribution and burden of dengue
Samir Bhatt,Peter W. Gething,Oliver J. Brady,Jane P. Messina,Andrew Farlow,Catherine L. Moyes,John M. Drake,John M. Drake,John S. Brownstein,Anne G. Hoen,Osman Sankoh,Osman Sankoh,Monica F. Myers,Dylan B. George,Thomas Jaenisch,G. R. William Wint,Cameron P. Simmons,Thomas W. Scott,Thomas W. Scott,Jeremy Farrar,Jeremy Farrar,Simon I. Hay,Simon I. Hay +22 more
TL;DR: These new risk maps and infection estimates provide novel insights into the global, regional and national public health burden imposed by dengue and will help to guide improvements in disease control strategies using vaccine, drug and vector control methods, and in their economic evaluation.
Journal ArticleDOI
Collinearity: a review of methods to deal with it and a simulation study evaluating their performance
Carsten F. Dormann,Jane Elith,Sven Bacher,Carsten M. Buchmann,Gudrun Carl,Gabriel Carré,Jaime Ricardo García Márquez,Bernd Gruber,Bruno Lafourcade,Pedro J. Leitão,Tamara Münkemüller,Colin J. McClean,Patrick E. Osborne,Björn Reineking,Boris Schröder,Andrew K. Skidmore,Damaris Zurell,Sven Lautenbach +17 more
TL;DR: It was found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection and the value of GLM in combination with penalised methods and thresholds when omitted variables are considered in the final interpretation.
Journal ArticleDOI
Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation
TL;DR: This paper presents a tuning method that uses presence-only data for parameter tuning, and introduces several concepts that improve the predictive accuracy and running time of Maxent and describes a new logistic output format that gives an estimate of probability of presence.
Journal ArticleDOI
Species Distribution Models: Ecological Explanation and Prediction Across Space and Time
Jane Elith,John R. Leathwick +1 more
TL;DR: Species distribution models (SDMs) as mentioned in this paper are numerical tools that combine observations of species occurrence or abundance with environmental estimates, and are used to gain ecological and evolutionary insights and to predict distributions across landscapes, sometimes requiring extrapolation in space and time.
Journal ArticleDOI
A working guide to boosted regression trees
TL;DR: This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model.
References
More filters
Journal ArticleDOI
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Book
Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach
TL;DR: The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference).
Journal ArticleDOI
A Coefficient of agreement for nominal Scales
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Journal ArticleDOI
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
Book
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.