Fine-tuning Transformer models for the automatic detection of Italian fake news: Multilinguality, influence of topic, & a shiny new corpus
Keywords
Loading...
Authors
Issue Date
2023-06-15
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
This thesis explores the automatic detection of fake news using pre-trained Transformer models. The research on this topic suffers from a lack of high-quality data. I manually collected a mid-size (150 real and 150 fake news articles) set of Italian articles and fine-tuned three different transformer-based language models on this data. These models reached moderately good scores, especially the finetuned UmBERTo model (F1 = .8) on a test set consisting of only articles pertaining to Covid-19 (F1 = .82). I created 4 additional models using multilingual (English and Spanish) training data (using the multilingual distilBERT model) and translated data to fine-tune existing models. The results show that models fine-tuned on both multi- and monolingual training data slightly outperform the monolingual models when tested on Covid data, but this advantage disappears when tested on domain-general articles. This study underlines the importance of high-quality, relevant training data when creating fake news detection models.
Description
Citation
Supervisor
Faculty
Faculteit der Letteren
