Robust statistical boosting with quantile-based adaptive loss functions

Speller, Jan; Staerk, Christian; Mayr, Andreas;

Research article (journal) | Peer reviewed

Abstract

We combine robust loss functions with statistical boosting algorithms in an adaptive way to perform variable selection and predictive modelling for potentially high-dimensional biomedical data. To achieve robustness against outliers in the outcome variable (vertical outliers), we consider different composite robust loss functions together with base-learners for linear regression. For composite loss functions, such as the Huber loss and the Bisquare loss, a threshold parameter has to be specified that controls the robustness. In the context of boosting algorithms, we propose an approach that adapts the threshold parameter of composite robust losses in each iteration to the current sizes of residuals, based on a fixed quantile level. We compared the performance of our approach to classical M-regression, boosting with standard loss functions or the lasso regarding prediction accuracy and variable selection in different simulated settings: the adaptive Huber and Bisquare losses led to a better performance when the outcome contained outliers or was affected by specific types of corruption. For non-corrupted data, our approach yielded a similar performance to boosting with the efficient L2 loss or the lasso. Also in the analysis of skewed KRT19 protein expression data based on gene expression measurements from human cancer cell lines (NCI-60 cell line panel), boosting with the new adaptive loss functions performed favourably compared to standard loss functions or competing robust approaches regarding prediction accuracy and resulted in very sparse models.

Details about the publication

JournalInternational Journal of Biostatistics
Volume19
Issue1
Page range111-119
StatusPublished
Release year2022 (10/08/2022)
Language in which the publication is writtenEnglish
DOI10.1515/ijb-2021-0127
Link to the full texthttps://doi.org/10.1515/ijb-2021-0127
KeywordsBisquare loss; gradient boosting; Huber loss; robust regression

Authors from the University of Münster

Speller, Jan
Junior professorship for practical computer science - modern aspects of data processing / data science (Prof. Braun)