The role of spatio-temporal information in speech recognition: A spiking network model

Keywords

Loading...
Thumbnail Image

Issue Date

2021-06-15

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

How humans understand speech is still not fully understood. Speech is composed of spatial and temporal information, and neurons in the auditory cortex have been shown to be selective to di erent degrees to these two dimensions [1, 2, 3, 4]. The goal of this project is to get a better understanding of the contribution of each of these dimensions on speech recognition. To this end, di erent ways of encoding the speech sounds were compared in a Spiking Neural Network [5]. The dataset used for this task is the SpikeTIMIT dataset [6] as it contains a biologically inspired spike encoding. In accordance with the neurological literature, it was found that both spatial and temporal features are needed for successful speech recognition. In fact, the more spatial information was removed from the temporal spike codes, the worse the accuracy got. However, it was also found that having only the spatial information of each stimuli was not su cient to distinguish between the di erent labels. This can be explained by the fact that the frequency ngerprint of each phone is very similar, making temporal information a crucial component of speech sounds. These ndings support that Spiking Neural Networks and spike encodings are relevant to model the auditory processes of the brain. However, it seems that there are still some key aspects of neurobiology these models are missing.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen