Exploring the sentiment analysis performance of BERT models on domain specific Twitter data when combined with an intelligent pre-processor

Bruin, de, Laurens

Exploring the sentiment analysis performance of BERT models on domain specific Twitter data when combined with an intelligent pre-processor

Files

Bruin de, L. s-1002199-BSc-Thesis-2022.pdf (274.42 KB)

Authors

Bruin, de, Laurens

Issue Date

2022-06-19

Language

en

URI

https://theses.ubn.ru.nl/handle/123456789/15836

Abstract

Bidirectional Encoder Representations from Transformers (BERT) are deep learning language models, used to understand the meaning of language based on context. BERT models are widely used in Natural Language Processing (NLP) research for tasks such as Sentiment Analysis (SA). Social media platforms such as Twitter offer a large quantity of data to run SA on. However Twitter data is very noisy, due to an extensive use of hashtags, emojis, abbreviations, and slang. This noise impairs the performance of BERT models on the SA task. There are BERT models that are pre-trained on Twitter data, however the features labeled as noise are not included in the pre-training. Another problem arises when Tweets contain a high count of niche vocabulary words that did not occur in the pre-training of the BERT models. We propose a fine-tuned pretrained BERT model combined with a pipeline of pre-processing methods called the “intelligent preprocessor” to overcome the challenges. The intelligent pre-processor is used to translate Twitter noise into a language structure that optimizes the models performance. Domain knowledge is used to help the intelligent pre-processor detect niche vocabulary and replace it with a common language alternatives. The proposed model outperformed the baseline pre-trained Twitter-based BERT model on a sentiment analysis task and confirmed findings of earlier research.

Supervisor

Basar, Erkan

Boumans, Roel

Faculty

Faculteit der Sociale Wetenschappen

Programme

Artificial Intelligence

Specialisation

Bachelor Artificial Intelligence

Collections

Faculteit der Sociale Wetenschappen

Full item page

Exploring the sentiment analysis performance of BERT models on domain specific Twitter data when combined with an intelligent pre-processor

Keywords

Files

Authors

Issue Date

Language

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

URI

DOI

Abstract

Description

Citation

Supervisor

Faculty

Programme

Specialisation

Collections