Learning to localize and classify spoken digits: a comparison of two SNN-frameworks
The human brain works very efficiently and accurately in tasks that require localization and recognition of sounds. In the brain, the precise spike timing of spike trains is used to convey information among biological neurons. Motivated by this efficient information processing capability of the brain, it makes sense to try to mimic this process with the use of spiking neural networks for computational modeling. In this thesis, two SNN-frameworks are proposed that are able to learn to localize and classify a set of spoken digits. Both frameworks make use of Legendre Memory Units and convolution layers, but their overall structure differs. The first framework uses one neural network to classify and localize the digits, whereas the second framework uses ten sub-networks to localize each digit separately. Results show that the first framework performs better in terms of accuracy and computational costs but the structure of the second frameworks provides more flexibility. The described frameworks could potentially be useful in modeling human speech recognition and localization, but still require a lot of further research in order to be able to perform in the real world.
Faculteit der Sociale Wetenschappen