A protocol for data exploration to avoid common statistical problems

doi:10.1111/J.2041-210X.2009.00001.X

Journal ArticleDOI

A protocol for data exploration to avoid common statistical problems

Alain F. Zuur, +2 more

- 01 Mar 2010 -

Methods in Ecology and Evolution

- Vol. 1, Iss: 1, pp 3-14

Chats0

TLDR

A protocol for data exploration is provided; current tools to detect outliers, heterogeneity of variance, collinearity, dependence of observations, problems with interactions, double zeros in multivariate analysis, zero inflation in generalized linear modelling, and the correct type of relationships between dependent and independent variables are discussed; and advice on how to address these problems when they arise is provided.

Abstract:

Summary 1. While teaching statistics to ecologists, the lead authors of this paper have noticed common statistical problems. If a random sample of their work (including scientific papers) produced before doing these courses were selected, half would probably contain violations of the underlying assumptions of the statistical techniques employed. 2. Some violations have little impact on the results or ecological conclusions; yet others increase type I or type II errors, potentially resulting in wrong ecological conclusions. Most of these violations can be avoided by applying better data exploration. These problems are especially troublesome in applied ecology, where management and policy decisions are often at stake. 3. Here, we provide a protocol for data exploration; discuss current tools to detect outliers, heterogeneity of variance, collinearity, dependence of observations, problems with interactions, double zeros in multivariate analysis, zero inflation in generalized linear modelling, and the correct type of relationships between dependent and independent variables; and provide advice on how to address these problems when they arise. We also address misconceptions about normality, and provide advice on data transformations. 4. Data exploration avoids type I and type II errors, among other problems, thereby reducing the chance of making wrong ecological conclusions and poor recommendations. It is therefore essential for good quality management and policy based on statistical analyses.

A protocol for data exploration to avoid common statistical problems

Citations

Collinearity: a review of methods to deal with it and a simulation study evaluating their performance

Multimodel inference in ecology and evolution: challenges and solutions

A brief introduction to mixed effects modelling and multi-model inference in ecology.

mvabund– an R package for model‐based analysis of multivariate abundance data

Do not log-transform count data

References

Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach

Applied Regression Analysis

Principal Component Analysis

Mixed Effects Models and Extensions in Ecology with R

Mixed-Effects Models in S and S-PLUS

Related Papers (5)

R: A language and environment for statistical computing.

Fitting Linear Mixed-Effects Models Using lme4

Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach

Mixed Effects Models and Extensions in Ecology with R

nlme : Linear and nonlinear mixed effects models