scispace - formally typeset
Journal ArticleDOI

A protocol for data exploration to avoid common statistical problems

Reads0
Chats0
TLDR
A protocol for data exploration is provided; current tools to detect outliers, heterogeneity of variance, collinearity, dependence of observations, problems with interactions, double zeros in multivariate analysis, zero inflation in generalized linear modelling, and the correct type of relationships between dependent and independent variables are discussed; and advice on how to address these problems when they arise is provided.
Abstract
Summary 1. While teaching statistics to ecologists, the lead authors of this paper have noticed common statistical problems. If a random sample of their work (including scientific papers) produced before doing these courses were selected, half would probably contain violations of the underlying assumptions of the statistical techniques employed. 2. Some violations have little impact on the results or ecological conclusions; yet others increase type I or type II errors, potentially resulting in wrong ecological conclusions. Most of these violations can be avoided by applying better data exploration. These problems are especially troublesome in applied ecology, where management and policy decisions are often at stake. 3. Here, we provide a protocol for data exploration; discuss current tools to detect outliers, heterogeneity of variance, collinearity, dependence of observations, problems with interactions, double zeros in multivariate analysis, zero inflation in generalized linear modelling, and the correct type of relationships between dependent and independent variables; and provide advice on how to address these problems when they arise. We also address misconceptions about normality, and provide advice on data transformations. 4. Data exploration avoids type I and type II errors, among other problems, thereby reducing the chance of making wrong ecological conclusions and poor recommendations. It is therefore essential for good quality management and policy based on statistical analyses.

read more

Citations
More filters
Journal ArticleDOI

Collinearity: a review of methods to deal with it and a simulation study evaluating their performance

TL;DR: It was found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection and the value of GLM in combination with penalised methods and thresholds when omitted variables are considered in the final interpretation.
Journal ArticleDOI

Multimodel inference in ecology and evolution: challenges and solutions

TL;DR: A number of practical obstacles to model averaging complex models are highlighted and it is hoped that this approach will become more accessible to those investigating any process where multiple variables impact an evolutionary or ecological response.
Journal ArticleDOI

A brief introduction to mixed effects modelling and multi-model inference in ecology.

TL;DR: This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.
Journal ArticleDOI

mvabund– an R package for model‐based analysis of multivariate abundance data

TL;DR: The mvabund package for R provides tools for model-based analysis of multivariate abundance data in ecology, which includes methods for visualising data, fitting predictive models, checking model assumptions, as well as testing hypotheses about the community–environment association.
Journal ArticleDOI

Do not log-transform count data

TL;DR: It is recommended that count data should not be analysed by log-transforming it, but instead models based on Poisson and negative binomial distributions should be used.
References
More filters
Book

Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach

TL;DR: The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference).
Book

Applied Regression Analysis

TL;DR: In this article, the Straight Line Case is used to fit a straight line by least squares, and the Durbin-Watson Test is used for checking the straight line fit.
Reference EntryDOI

Principal Component Analysis

TL;DR: Principal component analysis (PCA) as discussed by the authors replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables.
Book

Mixed Effects Models and Extensions in Ecology with R

TL;DR: In this paper, the authors apply additive mixed modelling on phyoplankton time series data and show that the additive model can be used to estimate the age distribution of small cetaceans.
Book

Mixed-Effects Models in S and S-PLUS

TL;DR: Linear Mixed-Effects and Nonlinear Mixed-effects (NLME) models have been studied in the literature as mentioned in this paper, where the structure of grouped data has been used for fitting LME models.