Automatic Detection of Bot-generated Tweets

Julien Tourille; Babacar Sow; Adrian Popescu

doi:10.1145/3512732.3533584

Communication Dans Un Congrès Année : 2022

Automatic Detection of Bot-generated Tweets

(1) , (2) , (1)

1
2

Julien Tourille

Fonction : Auteur correspondant

Département Intelligence Ambiante et Systèmes Interactifs

Babacar Sow

Fonction : Auteur

Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes

Adrian Popescu

Fonction : Auteur

Département Intelligence Ambiante et Systèmes Interactifs

Résumé

Deep neural networks have the capacity to generate textual content which is increasingly difficult to distinguish from that produced by humans. Such content can be used in disinformation campaigns and its detrimental effects are amplified if it spreads on social networks. Here, we study the automatic detection of bot-generated Twitter messages. This task is difficult due to combination between the strong performance of recent deep language models and the limited length of tweets. In this study, we propose a challenging definition of the problem by making no assumption regarding the bot account, its network or the method used to generate the text. We devise two approaches for bot detection based on pretrained language models and create a new dataset of generated tweets to improve the performance of our classifier on recent text generation algorithms. The obtained results show that the generalization capabilities of the proposed classifier heavily depends on the dataset used to trained the model. Interestingly, the two automatic dataset augmentation proposed here show promising results. Their introduction leads to consistent performance gains compared to the use of the original dataset alone.

Domaines

Informatique et langage [cs.CL]

Fichier principal

Bot_Tweet_Detection_HAL.pdf (2.53 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Contributeur MAP CEA : Connectez-vous pour contacter le contributeur

https://cea.hal.science/cea-03788573

Soumis le : lundi 26 septembre 2022-18:17:29

Dernière modification le : mercredi 3 avril 2024-11:14:12

Archivage à long terme le : mardi 27 décembre 2022-19:26:23

Dates et versions

cea-03788573 , version 1 (26-09-2022)

Identifiants

HAL Id : cea-03788573 , version 1
DOI : 10.1145/3512732.3533584

Citer

Julien Tourille, Babacar Sow, Adrian Popescu. Automatic Detection of Bot-generated Tweets. 1st ACM International Workshop on Multimedia AI against Disinformation, Jun 2022, Newark, United States. pp.44-51, ⟨10.1145/3512732.3533584⟩. ⟨cea-03788573⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA PRES_CLERMONT CNRS LIMOS DRT CEA-UPSAY UNIV-PARIS-SACLAY LIST GS-COMPUTER-SCIENCE GS-SPORT-HUMAN-MOVEMENT CLERMONT-AUVERGNE-INP

52 Consultations

335 Téléchargements

Automatic Detection of Bot-generated Tweets

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager