Fine-tuning Transformer models for the automatic detection of Italian fake news: Multilinguality, influence of topic, & a shiny new corpus

Laken, Katarina

Fine-tuning Transformer models for the automatic detection of Italian fake news: Multilinguality, influence of topic, & a shiny new corpus

Files

Master thesis ReMa LCS Laken, Katarina.pdf (7.14 MB)

Authors

Laken, Katarina

Issue Date

2023-06-15

Language

en

URI

https://theses.ubn.ru.nl/handle/123456789/16332

Abstract

This thesis explores the automatic detection of fake news using pre-trained Transformer models. The research on this topic suffers from a lack of high-quality data. I manually collected a mid-size (150 real and 150 fake news articles) set of Italian articles and fine-tuned three different transformer-based language models on this data. These models reached moderately good scores, especially the finetuned UmBERTo model (F1 = .8) on a test set consisting of only articles pertaining to Covid-19 (F1 = .82). I created 4 additional models using multilingual (English and Spanish) training data (using the multilingual distilBERT model) and translated data to fine-tune existing models. The results show that models fine-tuned on both multi- and monolingual training data slightly outperform the monolingual models when tested on Covid data, but this advantage disappears when tested on domain-general articles. This study underlines the importance of high-quality, relevant training data when creating fake news detection models.

Supervisor

Sanders, E.P.

Hendrickx, I.H.E.

Faculty

Faculteit der Letteren

Programme

Researchmasters

Specialisation

Researchmaster Language and Communication

Collections

Faculteit der Letteren

Full item page

Fine-tuning Transformer models for the automatic detection of Italian fake news: Multilinguality, influence of topic, & a shiny new corpus

Keywords

Files

Authors

Issue Date

Language

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

URI

DOI

Abstract

Description

Citation

Supervisor

Faculty

Programme

Specialisation

Collections