The Multi-Feature Tagger of English (MFTE): Rationale, description and evaluationOpen Access

Foll Le Elen , Shakir Muhammad

Research article (journal) | Peer reviewed

Abstract

The Multi-Feature Tagger of English (MFTE) provides a transparent and easily adaptable open-source tool for multivariable analyses of English corpora. Designed to contribute to the greater reproducibility, transparency, and accessibility of multivariable corpus studies, it comes with a simple GUI and is available both as a richly annotated Python script and as an executable file. In this article, we detail its features and how they are operationalised. The default tagset comprises 74 lexico-grammatical features, ranging from attributive adjectives and progressives to tag questions and emoticons. An optional extended tagset covers more than 70 additional features, including many semantic features, such as human nouns and verbs of causation. We evaluate the accuracy of the MFTE on a sample of 60 texts from the BNC2014 and COCA, and report precision and recall metrics for all the features of the simple tagset. We outline how that the use of a well-documented, open-source tool can contribute to improving the reproducibility and replicability of multivariable studies of English.

Details about the publication

JournalResearch in Corpus Linguistics
Volume13
Issue2
Page range63-93
StatusPublished
Release year2024 (12/01/2024)
DOI10.32714/ricl.13.02.03
Link to the full texthttp://dx.doi.org/10.32714/ricl.13.02.03
KeywordsCorpus Linguistics, English Linguistics, Register Studies, Multidimensional Analysis

Authors from the University of Münster

Shakir, Muhammad
Professur für Variationslinguistik (Prof. Deuber)