Interactive visual formula composition of multidimensional data classifiers

Derstroff, Adrian; Leistikow, Simon; Nahardani, Ali; Gruen, Katja; Franz, Marcus; Hoerr, Verena; Linsen, Lars

Forschungsartikel (Zeitschrift) | Peer reviewed

Zusammenfassung

Understanding how a classification result is generated and what role individual features play in the classification is crucial in many applications and, in particular, in medical contexts such as the translation of diagnosis biomarkers into clinical practice. The goal is to find (ideally simple) relationships between the features in multi-dimensional data and the classification for an explanation of the underlying phenomenon. Mathematical formulas allow for the expression of these relationships and can serve as classifiers. However, there are infinitely many mathematical formulas for the given features and they bear an inherent trade-off between complexity and accuracy. We present an interactive visual approach that supports domain experts to mitigate the trade-off issue. Core to our approach is a novel feature selection method, from which formulas are composed using symbolic regression and where state-of-the-art classifiers serve as a reference. To evaluate our approach and compare the achieved classification performance to the performance achieved by other state-of-the-art feature selection techniques, we test our methods with well-known machine learning data sets. Our evaluation shows that our feature selection method performs better than randomly selecting features for data sets with many features or when a low number of generations in the symbolic regression is required. Moreover, it consistently matches or outperforms state-of-the-art methods. Moreover, we apply our approach in a case study to a hemodynamic cohort data set, where we report our findings and domain expert feedback. Our approach was able to find formulas containing features that are in agreement with literature. Also, we could find formulas that performed better in the micro-averaged F1 score when compared to established histological indices.

Details zur Publikation

FachzeitschriftInformation Visualization
Jahrgang / Bandnr. / Volume0
StatusVeröffentlicht
Veröffentlichungsjahr2024 (13.09.2024)
Sprache, in der die Publikation verfasst istEnglisch
DOI10.1177/14738716241270288
Link zum Volltexthttps://doi.org/10.1177/14738716241270288
StichwörterClassification; Feature Space; Formulae; Multidimensional Data; Visual Analysis

Autor*innen der Universität Münster

Derstroff, Adrian
Professur für Praktische Informatik (Prof. Linsen)
Hörr, Verena
Klinik für Radiologie Bereich Lehre & Forschung
Leistikow, Simon
Professur für Praktische Informatik (Prof. Linsen)
Linsen, Lars
Professur für Praktische Informatik (Prof. Linsen)
Nahardani, Ali
Klinik für Radiologie Bereich Lehre & Forschung