Enhancing XLM-RoBERTa with Active Learning: Tackling Low Resource and Class-Imbalanced Data Challenge

Keywords

Loading...
Thumbnail Image

Issue Date

2024-08-24

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

This thesis explores two main areas in the context of Multilingual sentiment analysis in low resource, lifestyle and therapeutics settings. Firstly, it assesses the impact of active learning (AL) on enhancing XLM-RoBERTa’s representation of minority sentiment classes. Contrastive and Entropy-based AL were used. Secondly, the effectiveness of fine-tuning a bilingual XLM-RoBERTa model is compared to using a transfer learning pipeline with monolingual models. The results of model evaluation across various general and class specific metrics shows that while AL strategies improved minority class representation, particularly for negative sentiments, their overall impact on the model performance was constrained due to low-resources. Despite gains in class representation, AL models did not surpass the performance of the baseline fine-tuned XLM-RoBERTa model. However, the baseline outperformed the transfer learning pipeline with monolingual models, across all evaluation metrics. These finding highlights the value of fine-tuning multilingual pre-trained models for sentiment analysis and steps to consider when conducting AL in low-resource, data imbalanced settings. Limitations of the study include the small dataset size, potential annotation biases and the focus on single-message sentiment analysis rather than multi-turn conversation analysis. Ethical considerations related to the Ancora Health sentiment analysis task were also examined. Overall, this research provides valuable insights for guiding decisions in low resource settings and overcoming challenges in multi-class and multilingual sentiment analysis, particularly within the specialised domain of lifestyle and therapeutics

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen