Can I get uhh a better WER: Challenges and Opportunities in Evaluating Conversational Speech Recognition

Keywords
Loading...
Thumbnail Image
Authors
Issue Date
2023-08-24
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Conversational speech recognition stands as a pivotal area in language technology, yet it still remains a significant challenge in the field despite technological advancements. In this thesis, I argue that the only way to solve this is through representing the foundations of human interaction. In this research, I look into the interactional infrastructure and resources employed in spontaneous conversations and discuss how these are represented – or neglected – in Automatic Speech Recognition (ASR). An analysis on the differences between human and ASR transcriptions shows that current state-of-the-art systems fail to accurately reflect certain essential and characteristic features of conversations: turn-taking, overlaps, and conversational words. The results of this study points towards a necessary paradigm shift, illustrating the importance of using interaction linguistics to inform both conversational ASR system development and evaluation. To address some of these limitations, a new composite metric is proposed to augment the conventional Word Error Rate (WER).
Description
Citation
Faculty
Faculteit der Letteren