Fine-tuning Transformer models for the automatic detection of Italian fake news: Multilinguality, influence of topic, & a shiny new corpus

Keywords

Loading...
Thumbnail Image

Issue Date

2023-06-15

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

This thesis explores the automatic detection of fake news using pre-trained Transformer models. The research on this topic suffers from a lack of high-quality data. I manually collected a mid-size (150 real and 150 fake news articles) set of Italian articles and fine-tuned three different transformer-based language models on this data. These models reached moderately good scores, especially the finetuned UmBERTo model (F1 = .8) on a test set consisting of only articles pertaining to Covid-19 (F1 = .82). I created 4 additional models using multilingual (English and Spanish) training data (using the multilingual distilBERT model) and translated data to fine-tune existing models. The results show that models fine-tuned on both multi- and monolingual training data slightly outperform the monolingual models when tested on Covid data, but this advantage disappears when tested on domain-general articles. This study underlines the importance of high-quality, relevant training data when creating fake news detection models.

Description

Citation

Faculty

Faculteit der Letteren