The application of transformer-based language models in WSD
Keywords
Loading...
Authors
Issue Date
2024-09-24
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Word Sense Disambiguation (WSD) is the process of automatically linking semantically ambiguous information expressed through language to categorizations of senses. WSD is often based on distributional information gathered from word context, represented as vector embeddings. Transformer-based language models provide Contextualized Word Embeddings (CWEs) which enable disambiguation of polysemous and homonymous tokens. In this study, CWEs are produced by BERT for ambiguous tokens appearing in SemCor. Clustering is applied to find groups of similar CWEs. These clusters are mapped to the SemCor annotation. The effects of the entropy of sense inventories and of syntactic classes within them are tested. Additionally, word suggestions produced by RoBERTa are aggregated into ‘R-lists’ which represent each group. These are evaluated for informativity. Entropy has a significant negative effect on accuracy. Syntactic entropy has a positive effect on accuracy, but not within syntactically ambiguous words. R-lists are shown to provide a reasonable degree of informativity.
Description
Citation
Supervisor
Faculty
Faculteit der Letteren