Neural Networks and Glimpses for Speech-in-Noise Understanding
Keywords
Loading...
Authors
Issue Date
2022-07-12
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Humans use glimpses to identify speech in noise. However, Automatic
Speech Recognition (ASR) systems often look at signal-to-noise ratios (SNRs)
as a predictor for speech intelligibility. This research extends the studies by
Zhu et al. and Cooke et al. by evaluating the importance of glimpses
in noisy environments and the performance of an artificial neural network.
The existing wav2vec 2.0 model by Baevski et al. is used to test the performance
of this model on both clean and noisy speech, followed by an
analysis of glimpses. Results show that there is a strong positive correlation
between the word accuracy and the glimpse ratio which indicates that neural
networks rely on glimpses for speech-in-noise understanding. It is also
shown that glimpses are a better predictor for word accuracies than signalto-
noise ratios and that glimpses contribute more to the understanding of
non-stationary- than stationary- noise types.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen