Scoring German alternate uses items applying large-language models

Saretzki, J.; Knopf, T.; Forthmann, B.; Goecke, B.; Jaggy, A.-K.; Benedek, M.; Weiss, S.

Forschungsartikel (Zeitschrift) | Peer reviewed

Zusammenfassung

The alternate uses task (AUT) is the most popular measure when it comes to the assessment of creative potential. Since their implementation, AUT responses have been rated by humans, which is a laborious task and requires considerable resources. Large language models (LLMs) have shown promising performance in automatically scoring AUT responses in English as well as in other languages, but it is not clear which method works best for German data. Therefore, we investigated the performance of different LLMs for the automated scoring of German AUT responses. We compiled German data across five research groups including ~50,000 responses for 15 different alternate uses objects from eight lab and online survey studies (including ~2300 participants) to examine generalizability across datasets and assessment conditions. Following a pre-registered analysis plan, we compared the performance of two fine-tuned, multilingual LLM-based approaches [Cross-Lingual Alternate Uses Scoring (CLAUS) and the Open Creativity Scoring with Artificial Intelligence (OCSAI)] with the Generative Pre-trained Transformer (GPT-4) in scoring (a) the original German AUT responses and (b) the responses translated to English. We found that the LLM-based scorings were substantially correlated with human ratings, with higher relationships for OCSAI followed by GPT-4 and CLAUS. Response translation, however, had no consistent positive effect. We discuss the generalizability of the results across different items and studies and derive recommendations and future directions.

Details zur Publikation

Fachzeitschrift: Journal of Intelligence

Jahrgang / Bandnr. / Volume: 13

Ausgabe / Heftnr. / Issue: 64

Status: Veröffentlicht

Veröffentlichungsjahr: 2025

DOI: 10.3390/jintelligence13060064

Link zum Volltext: https://doi.org/10.3390/jintelligence13060064

Stichwörter: creativity; divergent thinking; assessment; automated scoring; large language models; alternate uses task; German; GPT

Autor*innen der Universität Münster

Forthmann, Boris

Professur für Statistik und Forschungsmethoden in der Psychologie (Prof. Nestler)

Scoring German alternate uses items applying large-language models

Zusammenfassung

Details zur Publikation

Autor*innen der Universität Münster

Kontakt

Top-Links