On Domain-specific Topic Modelling Using the Case of a Humanities JournalOpen Access

Redzuan, Nadja; Möller, Ralf; Gehrke, Marcel; Braun, Tanya

Research article in digital collection (conference) | Peer reviewed

Abstract

Topic modelling techniques have been an important tool for meaningful information retrieval. They also hold the potential to support researchers in areas such as humanities in exploring corpora of different topics in an automated way. One prominent method, latent Dirichlet allocation (LDA), describes documents as distributions over topics and topics as distributions over words. Most applications of LDA focus on sets of tweets, news articles, wikipedia entries, or academic publications covering various topics in a large corpus. In this article, LDA is used in a rather opposite setting: a domain-specific, small-scale corpus in the form of an academic journal concerned with the studies of modern and ancient manuscripts. From this case study, we infer steps specific to dealing with domain-specific corpora.

Details about the publication

Name of the repositoryCEUR
StatusPublished
Release year2023
Language in which the publication is writtenEnglish
ConferenceCHAI 2023 3rd Workshop on Humanities-centred AI, co-located with KI 2023, Berlin, Germany
Keywordstopic modelling; LDA; manuscript cultures

Authors from the University of Münster

Braun, Tanya

Talks on the publication

On Domain-specific Topic Modelling Using the Case of a Humanities Journal
Braun, Tanya (26/09/2023)
CHAI 2023 3rd Workshop on Humanities-centred AI, co-located with KI 2023, Berlin
Type of talk: scientific Talk