Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression

Flint C, Cearns M, Opel N, Redlich R, Mehler DMA, Emden D, Winter NR, Leenings R, Eickhoff SB, Kircher T, Krug A, Nenadic I, Arolt V, Clark S, Baune BT, Jiang X, Dannlowski U, Hahn T

Forschungsartikel (Zeitschrift) | Peer reviewed

Zusammenfassung

We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from major depressive disorder (MDD) and healthy control (HC) based on neuroimaging data. Drawing upon structural magnetic resonance imaging (MRI) data from a balanced sample of N = 1,868 MDD patients and HC from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of 61 %. Next, we mimicked the process by which researchers would draw samples of various sizes (N = 4 to N = 150) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes (N = 20), we observe accuracies of up to 95 %. For medium sample sizes (N = 100) accuracies up to 75 % were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.

Details zur Publikation

Fachzeitschrift: Neuropsychopharmacology (Neuropsychopharmacology)

Jahrgang / Bandnr. / Volume: 46

Seitenbereich: 1510-517

Status: Veröffentlicht

Veröffentlichungsjahr: 2021 (06.05.2021)

Sprache, in der die Publikation verfasst ist: Englisch

DOI: 10.1038/s41386-021-01020-7

Link zum Volltext: https://doi.org/10.1038/s41386-021-01020-7

Stichwörter: machine learning; neuroimaging; major depressive disorder; misestimation; overestimation; small sample size; clinical translation

Autor*innen der Universität Münster

Arolt, Volker	Klinik für Psychische Gesundheit
Baune, Bernhard	Klinik für Psychische Gesundheit
Dannlowski, Udo	Institut für Translationale Psychiatrie
Emden, Daniel	Institut für Translationale Psychiatrie
Flint, Claas	Professur für Praktische Informatik (Prof. Jiang)
Hahn, Tim	Klinik für Psychische Gesundheit
Jiang, Xiaoyi	Professur für Praktische Informatik (Prof. Jiang)
Leenings, Ramona	Institut für Translationale Psychiatrie
Redlich, Ronny	Institut für Translationale Psychiatrie
Winter, Nils	Institut für Translationale Psychiatrie

Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression

Zusammenfassung

Details zur Publikation

Autor*innen der Universität Münster

Betrieben von

Top-Links