Detecting mispronunciation of digit nine in ATC communciation using keyword spotting
Air Traffic Control (ATC) communication is an important process to ensure aviation safety. A minor mistake can lead to a disastrous event. Therefore, communication mistakes by pilots and controllers should be prevented. Previous studies have investigated the communication mistakes that are being made and its factors. However, information about automatically detecting communication mistakes are still inadequate, while it can provide better insights into the mistakes being made and help prevent them. One of the common mistakes made is the mispronunciation of the digit nine. Therefore, this thesis aims to detect the mispronunciation of the digit nine by pilots and controllers. A keyword spotting system based on convolutional recurrent neural networks by Kim and Nam (2019) is used to detect mispronunciations of the digit nine in ATC audio fragments. Furthermore, three different class imbalance techniques are explored to improve the model performance: random oversampling, weighted random sampling and weighted crossentropy loss. The results of the techniques are analyzed both individually and comparatively to determine which technique is best suited for the model and dataset. The results of this thesis indicate that the model with weighted cross entropy-loss can detect the pronunciations significantly above chance level. However, further improvement on the model is still necessary to achieve at least the same results as Kim and Nam (2019) and provide aid in reducing the ATC communication mistakes.
Faculteit der Sociale Wetenschappen