Random forests with spatial proxies for environmental modelling: opportunities and pitfalls

Milà C; Ludwig M; Pebesma E; Tonne C; Meyer H

Research article (journal) | Peer reviewed

Abstract

Spatial proxies such as coordinates and Euclidean distance fields are often added as predictors in random forest models; however, their suitability in different predictive conditions has not yet been thoroughly assessed. We investigated 1) the conditions under which spatial proxies are suitable, 2) the reasons for such adequacy, and 3) how proxy suitability can be assessed using cross-validation. In a simulation and two case studies, we found that adding spatial proxies improved model performance when both residual spatial autocorrelation, and regularly or randomly-distributed training samples, were present. Otherwise, inclusion of proxies was neutral or counterproductive and resulted in feature extrapolation for clustered samples. Random k-fold cross-validation systematically favoured models with spatial proxies even when not appropriate. As the benefits of spatial proxies are not universal, we recommend using spatial exploratory and validation analyses to determine their suitability, and considering alternative inherently spatial RF-GLS models.

Details about the publication

JournalGeoscientific Model Development
Volume2024
Issue17
Page range6007-603
StatusPublished
Release year2024
Language in which the publication is writtenEnglish
DOI10.5194/gmd-17-6007-2024
Link to the full texthttps://doi.org/10.5194/gmd-17-6007-2024
KeywordsSpatial modelling; Random Forest; Overfitting; Model validation

Authors from the University of Münster

Ludwig, Marvin
Professorship of Remote Sensing and Spatial Modelling
Meyer, Hanna
Professorship of Remote Sensing and Spatial Modelling
Pebesma, Edzer
Professur für Geoinformatik (Prof. Pebesma)