On Domain-specific Topic Modelling Using the Case of a Humanities JournalOpen Access

Redzuan, Nadja; Möller, Ralf; Gehrke, Marcel; Braun, Tanya

Forschungsartikel in Online-Sammlung (Konferenz) | Peer reviewed

Zusammenfassung

Topic modelling techniques have been an important tool for meaningful information retrieval. They also hold the potential to support researchers in areas such as humanities in exploring corpora of different topics in an automated way. One prominent method, latent Dirichlet allocation (LDA), describes documents as distributions over topics and topics as distributions over words. Most applications of LDA focus on sets of tweets, news articles, wikipedia entries, or academic publications covering various topics in a large corpus. In this article, LDA is used in a rather opposite setting: a domain-specific, small-scale corpus in the form of an academic journal concerned with the studies of modern and ancient manuscripts. From this case study, we infer steps specific to dealing with domain-specific corpora.

Details zur Publikation

Name des RepositoriumsCEUR
StatusVeröffentlicht
Veröffentlichungsjahr2023
Sprache, in der die Publikation verfasst istEnglisch
KonferenzCHAI 2023 3rd Workshop on Humanities-centred AI, co-located with KI 2023, Berlin, Deutschland
Stichwörtertopic modelling; LDA; manuscript cultures

Autor*innen der Universität Münster

Braun, Tanya

Vorträge zur Publikation

On Domain-specific Topic Modelling Using the Case of a Humanities Journal
Braun, Tanya (26.09.2023)
CHAI 2023 3rd Workshop on Humanities-centred AI, co-located with KI 2023, Berlin
Art des Vortrags: wissenschaftlicher Vortrag