Audio classification using GRU

Keywords
Loading...
Thumbnail Image
Issue Date
2022-06-20
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
How the brain processes real life sound fragments into neural representations is studied actively and there are still many things unexplained. In this paper, inspired by Francl & McDermott (2022) and Van der Heijden & Mehrkanoon (2020), I investigated deep recurrent neural networks (RNNs) with gated recurrent units (GRUs) to come one step closer to understanding the auditory processing in humans. This biological inspired recurrent neural network is trained on predicting the azimuth location of sound as well as predicting the category of sound (i.e. speech, nature, urban, music and human sounds). Both predictions are multi-label multi class classification tasks, and the performance of the model is measured using the binary cross entropy loss. The model is human inspired because of the architectural design choices, such as separate left and right channel input. But also, each classification task has its own pathway, mimicking the different areas in the brain that perform audio localisation and identification. This model was tested using a train/test set of approximately 50,000 one-second audio fragments (approximately 14 hours of audio in total). Additionally, the model was evaluated on an unseen evaluation set to ensure ecological validity. Especially the localisation task of the model showed results that indicate generalisability. It also demonstrated similar error pattern compared to humans, as discussed in the paper. However, the identification task did not show the same results. It did not compare to human accuracy, nor did it have similar error patterns. Overall, the errors measured of this multi-task RNN were bigger than human performance. I suggest in order to conclude more from this human inspired GRU model, one needs to introduce more training data. Another way to extend this research would be by exploring different types of neural networks while staying true to the biological design. For instance, incorporating spiking neural networks (SNNs) into this research and an increase in quantity of the input data is an interesting next step in this field.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen