On Domain-specific Topic Modelling Using the Case of a Humanities Journal

Basic data for this talk

Type of talkscientific talk
Name der VortragendenBraun, Tanya
Date of talk26/09/2023
Talk languageEnglish
DOI10.25592/uhhfdm.13423
URL of slideshttps://www.fdr.uni-hamburg.de/record/13423

Information about the event

Name of the eventCHAI 2023 3rd Workshop on Humanities-centred AI, co-located with KI 2023
Event period26/09/2023
Event locationBerlin
Event websitehttps://www.csmc.uni-hamburg.de/ki2023-chai

Abstract

Topic modelling techniques have been an important tool for meaningful information retrieval. They also hold the potential to support researchers in areas such as humanities in exploring corpora of different topics in an automated way. One prominent method, latent Dirichlet allocation (LDA), describes documents as distributions over topics and topics as distributions over words. Most applications of LDA focus on sets of tweets, news articles, wikipedia entries, or academic publications covering various topics in a large corpus. In this article, LDA is used in a rather opposite setting: a domain-specific, small-scale corpus in the form of an academic journal concerned with the studies of modern and ancient manuscripts. From this case study, we infer steps specific to dealing with domain-specific corpora.
Keywordstopic modelling; LDA; manuscript cultures

Speakers from the University of Münster

Braun, Tanya
Junior professorship for practical computer science - modern aspects of data processing / data science (Prof. Braun)