Adaption and Evaluation of State-of-the-Art Wav2Vec2 Models on Transcribing Air Traffic Control Communication Data
Keywords
Loading...
Authors
Issue Date
2022-10-10
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Air Traffic Control (ATC) can benefit greatly from Automatic Speech Recognition (ASR),
as verbal communication between controllers and pilots contain essential callsigns and commands.
Errors in communication can be disastrous and should thus be minimized. ASR
decreases workload and improves aviation safety. ASR models designed for the ATC domain
are limited, furthermore, an inadequacy of Wav2Vec2 architectures exist. State-of-the-art
(SOTA) models have been studied insufficiently. This work therefore set out to adapt ASR
models using SOTA Wav2Vec2 model architectures, fine-tune and evaluate them on robustness
in the domain of ATC. The effect additional training data and an in-domain language
model (LM) have on performance are evaluated as well. Word Error Rate Reduction (WERR)
and Character Error Rate Reduction (CERR) of ∼95.5% and ∼96.1% were achieved on the
best performing XLS-R model. It was found that additional training data had a negative
correlation to WERR and CERR. An in-domain LM has ∼33% decrease on WER and ∼20%
decrease on CER when applied to the XLS-R models. These results indicate that a solid
contribution to the field of ASR for ATC has been made, supplying fine-tuned Wav2Vec2
models in the process.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen
