Adaption and Evaluation of State-of-the-Art Wav2Vec2 Models on Transcribing Air Traffic Control Communication Data

Keywords

Loading...
Thumbnail Image

Issue Date

2022-10-10

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

Air Traffic Control (ATC) can benefit greatly from Automatic Speech Recognition (ASR), as verbal communication between controllers and pilots contain essential callsigns and commands. Errors in communication can be disastrous and should thus be minimized. ASR decreases workload and improves aviation safety. ASR models designed for the ATC domain are limited, furthermore, an inadequacy of Wav2Vec2 architectures exist. State-of-the-art (SOTA) models have been studied insufficiently. This work therefore set out to adapt ASR models using SOTA Wav2Vec2 model architectures, fine-tune and evaluate them on robustness in the domain of ATC. The effect additional training data and an in-domain language model (LM) have on performance are evaluated as well. Word Error Rate Reduction (WERR) and Character Error Rate Reduction (CERR) of ∼95.5% and ∼96.1% were achieved on the best performing XLS-R model. It was found that additional training data had a negative correlation to WERR and CERR. An in-domain LM has ∼33% decrease on WER and ∼20% decrease on CER when applied to the XLS-R models. These results indicate that a solid contribution to the field of ASR for ATC has been made, supplying fine-tuned Wav2Vec2 models in the process.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen