The role of spatio-temporal information in speech recognition: A spiking network model
Keywords
Loading...
Authors
Issue Date
2021-06-15
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
How humans understand speech is still not fully understood. Speech is composed of spatial
and temporal information, and neurons in the auditory cortex have been shown to be selective to
di erent degrees to these two dimensions [1, 2, 3, 4]. The goal of this project is to get a better
understanding of the contribution of each of these dimensions on speech recognition. To this end,
di erent ways of encoding the speech sounds were compared in a Spiking Neural Network [5].
The dataset used for this task is the SpikeTIMIT dataset [6] as it contains a biologically inspired
spike encoding. In accordance with the neurological literature, it was found that both spatial and
temporal features are needed for successful speech recognition. In fact, the more spatial information
was removed from the temporal spike codes, the worse the accuracy got. However, it was also found
that having only the spatial information of each stimuli was not su cient to distinguish between
the di erent labels. This can be explained by the fact that the frequency ngerprint of each phone
is very similar, making temporal information a crucial component of speech sounds. These ndings
support that Spiking Neural Networks and spike encodings are relevant to model the auditory
processes of the brain. However, it seems that there are still some key aspects of neurobiology
these models are missing.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen