Decoding of spoken words in a spiking neural network

Thumbnail Image
Issue Date
Journal Title
Journal ISSN
Volume Title
Speech is the most common mode of human communication and thus this is studied. Word recognition is a start to speech recognition in the brain. A spiking neural network is used in this project to conduct a word application task. The project's goal was to gain a better understanding of an optimal decoding pipeline for maximizing the accuracy of a word recognition task that receives audio input. The presented decoding pipeline has multiple components. First, the received network features are various variations of spike and state feature. State features consist of the membrane potential and adaptive current. Principal Component Analysis (PCA) was used for dimensionality reduction. Following the application of four classifiers: Logistic Regression, Decision Tree, Random Forest, and Support Vector Machine, the classification for both words and phones can be evaluated. This resulted in a maximum k-score of about 85% for words and 60% for phones. The computation time varied from a few seconds to about an hour, depending on the classifier, and was considered when deciding the best classifier. Overall, the best pipeline for a word recognition task to maximize accuracy uses a 'non-averaged feature space' with PCA. Afterwards, Random Forest classification is applied. The study showed that state features outperformed spike features. This is interesting since even though state features relate to spiking features, spiking neurons are the foundation of a spiking neural network. Overall, the findings of this project, when combined with research on other components of the spiking neural network, contribute to an overall improved spiking neural network for spoken words. Keywords: spiking neural network, word recognition, decoding, classification, computation time
Faculteit der Sociale Wetenschappen