Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation
TLDR
This paper presents a tuning method that uses presence-only data for parameter tuning, and introduces several concepts that improve the predictive accuracy and running time of Maxent and describes a new logistic output format that gives an estimate of probability of presence.Abstract:
Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use "default settings", tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presence-absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce "hinge features" that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore "background sampling" strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) "target-group" background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance.read more
Citations
More filters
Journal ArticleDOI
The global distribution and burden of dengue
Samir Bhatt,Peter W. Gething,Oliver J. Brady,Jane P. Messina,Andrew Farlow,Catherine L. Moyes,John M. Drake,John M. Drake,John S. Brownstein,Anne G. Hoen,Osman Sankoh,Osman Sankoh,Monica F. Myers,Dylan B. George,Thomas Jaenisch,G. R. William Wint,Cameron P. Simmons,Thomas W. Scott,Thomas W. Scott,Jeremy Farrar,Jeremy Farrar,Simon I. Hay,Simon I. Hay +22 more
TL;DR: These new risk maps and infection estimates provide novel insights into the global, regional and national public health burden imposed by dengue and will help to guide improvements in disease control strategies using vaccine, drug and vector control methods, and in their economic evaluation.
Journal ArticleDOI
A statistical explanation of MaxEnt for ecologists
TL;DR: A new statistical explanation of MaxEnt is described, showing that the model minimizes the relative entropy between two probability densities defined in covariate space, which is likely to be a more accessible way to understand the model than previous ones that rely on machine learning concepts.
Journal ArticleDOI
A practical guide to MaxEnt for modeling species' distributions: what it does, and why inputs and settings matter
TL;DR: A detailed explanation of how MaxEnt works and a prospectus on modeling options are provided to enable users to make informed decisions when preparing data, choosing settings and interpreting output to highlight the need for making biologically motivated modeling decisions.
Journal ArticleDOI
Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data
Steven J. Phillips,Miroslav Dudík,Jane Elith,Catherine H. Graham,Anthony Lehmann,John R. Leathwick,Simon Ferrier +6 more
TL;DR: It is argued that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions and as large an effect on predictive performance as the choice of modeling method.
Journal ArticleDOI
The art of modelling range-shifting species
TL;DR: Modelling approaches are explored that aim to minimize extrapolation errors and assess predictions against prior biological knowledge to promote methods appropriate to range‐shifting species.
References
More filters
Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Journal ArticleDOI
Greedy function approximation: A gradient boosting machine.
TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
Journal ArticleDOI
Maximum entropy modeling of species geographic distributions
TL;DR: In this paper, the use of the maximum entropy method (Maxent) for modeling species geographic distributions with presence-only data was introduced, which is a general-purpose machine learning method with a simple and precise mathematical formulation.
Journal ArticleDOI
Information Theory and Statistical Mechanics. II
TL;DR: In this article, the authors consider statistical mechanics as a form of statistical inference rather than as a physical theory, and show that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle.
Related Papers (5)
Novel methods improve prediction of species' distributions from occurrence data
Jane Elith,Catherine H. Graham,Robert P. Anderson,Miroslav Dudík,Simon Ferrier,Antoine Guisan,Robert J. Hijmans,Falk Huettmann,John R. Leathwick,Anthony Lehmann,Jin Li,Lúcia G. Lohmann,Bette A. Loiselle,Glenn Manion,Craig Moritz,Miguel Nakamura,Yoshinori Nakazawa,Jacob C. M. Mc Overton,A. Townsend Peterson,Steven J. Phillips,Karen Richardson,Ricardo Scachetti-Pereira,Robert E. Schapire,Jorge Soberón,Stephen E. Williams,Mary S. Wisz,Niklaus E. Zimmermann +26 more