scispace - formally typeset
Open AccessJournal ArticleDOI

Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation

Steven J. Phillips, +1 more
- 01 Apr 2008 - 
- Vol. 31, Iss: 2, pp 161-175
TLDR
This paper presents a tuning method that uses presence-only data for parameter tuning, and introduces several concepts that improve the predictive accuracy and running time of Maxent and describes a new logistic output format that gives an estimate of probability of presence.
Abstract
Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use "default settings", tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presence-absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce "hinge features" that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore "background sampling" strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) "target-group" background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance.

read more

Citations
More filters
Journal ArticleDOI

A statistical explanation of MaxEnt for ecologists

TL;DR: A new statistical explanation of MaxEnt is described, showing that the model minimizes the relative entropy between two probability densities defined in covariate space, which is likely to be a more accessible way to understand the model than previous ones that rely on machine learning concepts.
Journal ArticleDOI

A practical guide to MaxEnt for modeling species' distributions: what it does, and why inputs and settings matter

TL;DR: A detailed explanation of how MaxEnt works and a prospectus on modeling options are provided to enable users to make informed decisions when preparing data, choosing settings and interpreting output to highlight the need for making biologically motivated modeling decisions.
Journal ArticleDOI

Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data

TL;DR: It is argued that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions and as large an effect on predictive performance as the choice of modeling method.
Journal ArticleDOI

The art of modelling range-shifting species

TL;DR: Modelling approaches are explored that aim to minimize extrapolation errors and assess predictions against prior biological knowledge to promote methods appropriate to range‐shifting species.
References
More filters
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Journal ArticleDOI

Greedy function approximation: A gradient boosting machine.

TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
Journal ArticleDOI

Maximum entropy modeling of species geographic distributions

TL;DR: In this paper, the use of the maximum entropy method (Maxent) for modeling species geographic distributions with presence-only data was introduced, which is a general-purpose machine learning method with a simple and precise mathematical formulation.
Journal ArticleDOI

Information Theory and Statistical Mechanics. II

TL;DR: In this article, the authors consider statistical mechanics as a form of statistical inference rather than as a physical theory, and show that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle.
Related Papers (5)