Audio classification using GRU
Keywords
Loading...
Authors
Issue Date
2022-06-20
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
How the brain processes real life sound fragments into neural representations
is studied actively and there are still many things unexplained. In this
paper, inspired by Francl & McDermott (2022) and Van der Heijden &
Mehrkanoon (2020), I investigated deep recurrent neural networks (RNNs)
with gated recurrent units (GRUs) to come one step closer to understanding
the auditory processing in humans. This biological inspired recurrent neural
network is trained on predicting the azimuth location of sound as well as
predicting the category of sound (i.e. speech, nature, urban, music and
human sounds). Both predictions are multi-label multi class classification
tasks, and the performance of the model is measured using the binary cross
entropy loss. The model is human inspired because of the architectural
design choices, such as separate left and right channel input. But also,
each classification task has its own pathway, mimicking the different areas
in the brain that perform audio localisation and identification. This model
was tested using a train/test set of approximately 50,000 one-second audio
fragments (approximately 14 hours of audio in total). Additionally, the
model was evaluated on an unseen evaluation set to ensure ecological validity.
Especially the localisation task of the model showed results that indicate
generalisability. It also demonstrated similar error pattern compared to
humans, as discussed in the paper. However, the identification task did not
show the same results. It did not compare to human accuracy, nor did it have
similar error patterns. Overall, the errors measured of this multi-task RNN
were bigger than human performance. I suggest in order to conclude more
from this human inspired GRU model, one needs to introduce more training
data. Another way to extend this research would be by exploring different
types of neural networks while staying true to the biological design. For
instance, incorporating spiking neural networks (SNNs) into this research
and an increase in quantity of the input data is an interesting next step in
this field.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen