Enhancing XLM-RoBERTa with Active Learning: Tackling Low Resource and Class-Imbalanced Data Challenge
Keywords
Loading...
Authors
Issue Date
2024-08-24
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
This thesis explores two main areas in the context of Multilingual sentiment analysis in low resource, lifestyle and therapeutics settings. Firstly, it assesses the impact of active learning
(AL) on enhancing XLM-RoBERTa’s representation of minority sentiment classes.
Contrastive and Entropy-based AL were used. Secondly, the effectiveness of fine-tuning a
bilingual XLM-RoBERTa model is compared to using a transfer learning pipeline with
monolingual models. The results of model evaluation across various general and class
specific metrics shows that while AL strategies improved minority class representation,
particularly for negative sentiments, their overall impact on the model performance was
constrained due to low-resources. Despite gains in class representation, AL models did not
surpass the performance of the baseline fine-tuned XLM-RoBERTa model. However, the
baseline outperformed the transfer learning pipeline with monolingual models, across all
evaluation metrics. These finding highlights the value of fine-tuning multilingual pre-trained
models for sentiment analysis and steps to consider when conducting AL in low-resource,
data imbalanced settings. Limitations of the study include the small dataset size, potential
annotation biases and the focus on single-message sentiment analysis rather than multi-turn
conversation analysis. Ethical considerations related to the Ancora Health sentiment analysis
task were also examined. Overall, this research provides valuable insights for guiding
decisions in low resource settings and overcoming challenges in multi-class and multilingual
sentiment analysis, particularly within the specialised domain of lifestyle and therapeutics
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen
