Skip navigation

Assessing the effect of prevalence on the predictive performance of species distribution models using simulated data

Assessing the effect of prevalence on the predictive performance of species distribution models using simulated data

Santika, Truly ORCID: 0000-0002-3125-9467 (2010) Assessing the effect of prevalence on the predictive performance of species distribution models using simulated data. Global Ecology and Biogeography, 20 (1). pp. 181-192. ISSN 1466-822X (Print), 1466-8238 (Online) (doi:https://doi.org/10.1111/j.1466-8238.2010.00581.x)

Full text not available from this repository. (Request a copy)

Abstract

Aim. The proportion of sampled sites where a species is present is known as prevalence. Empirical studies have shown that prevalence can affect the predictive performance of species distribution models. This paper uses simulated species data to examine how prevalence and the form of species environmental dependence affect the assessment of the predictive performance of models.

Methods. Simulated species data were based on various functions of simulated environmental data with differing degrees of spatial correlation. Seven model performance measures – sensitivity, specificity, class‐average (CA), overall prediction success, kappa (κ), normalized mutual information (NMI) and area under the receiver operating characteristic curve (AUC) – were applied to species models fitted by three regression methods. The response of the performance measures to prevalence was then assessed. Three probability threshold selection methods used to convert fitted logistic model values to presence or absence were also assessed.

Results. The study shows that the extent to which prevalence affects model performance depends on the modelling technique and its degree of success in capturing dominant environmental determinants. It also depends on the statistic used to measure model performance and the probability threshold method. The response based on κ generally preferred models with medium prevalence. All performance measures were least affected by prevalence when the probability threshold was chosen to maximize predictive performance or was based directly on prevalence. In these cases, the responses based on AUC, CA and NMI generally preferred models with small or large prevalence.

Main conclusions. The effect of prevalence on the predictive performance of species distribution models has a methodological basis. Relevant factors include the success of the fitted distribution model in capturing the dominant environmental determinant, the model performance measure and the probability threshold selection method. The fixed probability threshold method yields a marked response of model performance to prevalence and is therefore not recommended. The study explains previous empirical results obtained with real data.

Item Type: Article
Uncontrolled Keywords: AUC, CART, class-average, GAM, GLM, kappa, normalized mutual information, species prevalence, species response curves
Subjects: S Agriculture > S Agriculture (General)
Faculty / Department / Research Group: Faculty of Engineering & Science
Faculty of Engineering & Science > Natural Resources Institute
Faculty of Engineering & Science > Natural Resources Institute > Agriculture, Health & Environment Department
Last Modified: 25 Jun 2020 07:12
Selected for GREAT 2016: None
Selected for GREAT 2017: None
Selected for GREAT 2018: None
Selected for GREAT 2019: None
Selected for REF2021: None
URI: http://gala.gre.ac.uk/id/eprint/28356

Actions (login required)

View Item View Item