Interactive visual formula composition of multidimensional data classifiers

Derstroff, Adrian; Leistikow, Simon; Nahardani, Ali; Gruen, Katja; Franz, Marcus; Hoerr, Verena; Linsen, Lars

Research article (journal) | Peer reviewed

Abstract

Understanding how a classification result is generated and what role individual features play in the classification is crucial in many applications and, in particular, in medical contexts such as the translation of diagnosis biomarkers into clinical practice. The goal is to find (ideally simple) relationships between the features in multi-dimensional data and the classification for an explanation of the underlying phenomenon. Mathematical formulas allow for the expression of these relationships and can serve as classifiers. However, there are infinitely many mathematical formulas for the given features and they bear an inherent trade-off between complexity and accuracy. We present an interactive visual approach that supports domain experts to mitigate the trade-off issue. Core to our approach is a novel feature selection method, from which formulas are composed using symbolic regression and where state-of-the-art classifiers serve as a reference. To evaluate our approach and compare the achieved classification performance to the performance achieved by other state-of-the-art feature selection techniques, we test our methods with well-known machine learning data sets. Our evaluation shows that our feature selection method performs better than randomly selecting features for data sets with many features or when a low number of generations in the symbolic regression is required. Moreover, it consistently matches or outperforms state-of-the-art methods. Moreover, we apply our approach in a case study to a hemodynamic cohort data set, where we report our findings and domain expert feedback. Our approach was able to find formulas containing features that are in agreement with literature. Also, we could find formulas that performed better in the micro-averaged F1 score when compared to established histological indices.

Details about the publication

JournalInformation Visualization
Volume0
StatusPublished
Release year2024 (13/09/2024)
Language in which the publication is writtenEnglish
DOI10.1177/14738716241270288
Link to the full texthttps://doi.org/10.1177/14738716241270288
KeywordsClassification; Feature Space; Formulae; Multidimensional Data; Visual Analysis

Authors from the University of Münster

Derstroff, Adrian
Professorship for Practical Computer Science (Prof. Linsen)
Hörr, Verena
Clinic of Radiology
Leistikow, Simon
Professorship for Practical Computer Science (Prof. Linsen)
Linsen, Lars
Professorship for Practical Computer Science (Prof. Linsen)
Nahardani, Ali
Clinic of Radiology