Benchmarking Sentence Embeddings in Textual Stream Clustering with Applications to Campaign Detection

Stampe, Lucas; Lütke-Stockdiek, Janina; Grimme, Britta; Grimme, Christian

Research article in edited proceedings (conference) | Peer reviewed

Abstract

Motivated by the emergence of large language models, we conduct a benchmark of sentence embeddings used to represent short texts in textual stream clustering. We achieve comparable results by adapting a non-textual stream clustering algorithm to use sentence embeddings compared to textual stream clustering approaches that use other textual representation mechanisms. Benchmarking datasets with differing degrees of preprocessing are used. The results suggest that the chosen approach using sentence embeddings does not perform as well as previous approaches on preprocessed datasets but has more significant potential on less preprocessed datasets. This highlights the need for new and more application-oriented benchmarking datasets for stream clustering. Further, we conduct a case study in the context of social media campaign detection and show that the approaches are able to find traces of orchestrated activities.

Details about the publication

Editors: Hirose, Akira; Ishibuchi, Hisao; Jayne, Chrisina;

Book title: Proceedings of the IEEE World Congress on Computational Intelligence (WCCI) - International Joint Conference on Neural Networks (IJCNN)

Page range: 1-8

Publisher: Wiley-IEEE Press

Place of publication: New Jersey

Status: Published

Release year: 2024

Language in which the publication is written: English

Conference: IEEE World Congress on Computational Intelligence, 30 June 2024 - 05 July 2024 , Yokohama, Japan

ISBN: 979-8-3503-5931-2

DOI: 10.1109/IJCNN60899.2024.10650595

Link to the full text: https://ieeexplore.ieee.org/abstract/document/10650595

Keywords: stream clustering; embeddings; benchmark

Authors from the University of Münster

Grimme, Christian	Research Group Computational Social Science and Systems Analysis (CSSSA)
Lütke-Stockdiek, Janina Susanne	Research Group Computational Social Science and Systems Analysis (CSSSA)
Stampe, Lucas	Research Group Computational Social Science and Systems Analysis (CSSSA)

Benchmarking Sentence Embeddings in Textual Stream Clustering with Applications to Campaign Detection

Abstract

Details about the publication

Authors from the University of Münster

Operated by

Top-Links