Robust statistical boosting with quantile-based adaptive loss functions

Speller, Jan; Staerk, Christian; Mayr, Andreas;

Forschungsartikel (Zeitschrift) | Peer reviewed

Zusammenfassung

We combine robust loss functions with statistical boosting algorithms in an adaptive way to perform variable selection and predictive modelling for potentially high-dimensional biomedical data. To achieve robustness against outliers in the outcome variable (vertical outliers), we consider different composite robust loss functions together with base-learners for linear regression. For composite loss functions, such as the Huber loss and the Bisquare loss, a threshold parameter has to be specified that controls the robustness. In the context of boosting algorithms, we propose an approach that adapts the threshold parameter of composite robust losses in each iteration to the current sizes of residuals, based on a fixed quantile level. We compared the performance of our approach to classical M-regression, boosting with standard loss functions or the lasso regarding prediction accuracy and variable selection in different simulated settings: the adaptive Huber and Bisquare losses led to a better performance when the outcome contained outliers or was affected by specific types of corruption. For non-corrupted data, our approach yielded a similar performance to boosting with the efficient L2 loss or the lasso. Also in the analysis of skewed KRT19 protein expression data based on gene expression measurements from human cancer cell lines (NCI-60 cell line panel), boosting with the new adaptive loss functions performed favourably compared to standard loss functions or competing robust approaches regarding prediction accuracy and resulted in very sparse models.

Details zur Publikation

FachzeitschriftInternational Journal of Biostatistics
Jahrgang / Bandnr. / Volume19
Ausgabe / Heftnr. / Issue1
Seitenbereich111-119
StatusVeröffentlicht
Veröffentlichungsjahr2022 (10.08.2022)
Sprache, in der die Publikation verfasst istEnglisch
DOI10.1515/ijb-2021-0127
Link zum Volltexthttps://doi.org/10.1515/ijb-2021-0127
StichwörterBisquare loss; gradient boosting; Huber loss; robust regression

Autor*innen der Universität Münster

Speller, Jan
Juniorprofessur für Praktische Informatik - Moderne Aspekte der Verarbeitung von Daten / Data Science (Prof. Braun)