Selecting textual analysis tools to classify sustainability information in corporate reporting

Maibaum, Frederik; Kriebel, Johannes; Foege, Johann Nils

Research article (journal) | Peer reviewed

Abstract

Information on firms' sustainability often partly resides in unstructured data published, for instance, in annual reports, news, and transcripts of earnings calls. In recent years, researchers and practitioners have started to extract information from these data sources using a broad range of natural language processing (NLP) methods. While there is much to be gained from these endeavors, studies that employ these methods rarely reflect upon the validity and quality of the chosen method—that is, how adequately NLP captures the sustainability information from text. This practice is problematic, as different NLP techniques lead to different results regarding the extraction of information. Hence, the choice of method may affect the outcome of the application and thus the inferences that users draw from their results. In this study, we examine how different types of NLP methods influence the validity and quality of extracted information. In particular, we compare four primary methods, namely (1) dictionary-based techniques, (2) topic modeling approaches, (3) word embeddings, and (4) large language models such as BERT and ChatGPT, and evaluate them on 75,000 manually labeled sentences from 10-K annual reports that serve as the ground truth. Our results show that dictionaries have a large variation in quality, topic models outperform other approaches that do not rely on large language models, and large language models show the strongest performance. In large language models, individual fine-tuning remains crucial. One-shot approaches (i.e., ChatGPT) have lately surpassed earlier approaches when using well-designed prompts and the most recent models.

Details about the publication

JournalDecision Support Systems
Volume183
Article number114269
StatusPublished
Release year2024
Language in which the publication is writtenEnglish
DOI10.1016/j.dss.2024.114269
Link to the full texthttps://www.sciencedirect.com/science/article/pii/S0167923624001027
KeywordsSustainability; Natural language processing; Corporate reporting; Performance evaluation; ChatGPT

Authors from the University of Münster

Foege, Johann Nils
Professorship for Innovation, Strategy and Organization (Prof. Foege)
Kriebel, Johannes
Chair of Banking
Maibaum, Frederik
Professorship for Innovation, Strategy and Organization (Prof. Foege)