Transfer Learning in Emotion Recognition of the Singing vs Speaking Voice
Keywords
Loading...
Authors
Issue Date
2023-01-25
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Emotion recognition can be a very beneficial tool for for example voice assistants or music
applications. However, the datasets available for song and speech emotion recognition are
small, which can be problematic as models generally perform better when trained on more
data. Therefore, transfer learning was proposed by this paper to circumvent this problem
as the model is then trained on data from a domain that is different, but similar, to the
testing domain. Because of this, there is more training data available. Transfer learning
was also used to confirm whether song and speech emotion recognition are generalizable,
both in the human brain and computational models. With the use of literature research, it
was uncovered that the two types of emotion recognition are generalizable in the human
brain as they overlap in terms of brain areas active and levels of acoustic parameters.
With the use of an existing convolutional neural network, however, it was uncovered that
a direct transfer with the chosen model was not an effective method for song and speech
emotion recognition as it worsened the performance. Training the model on a mix of
song and speech data did lead to a similar performance as without transfer learning and
enlarged the size of the training set. Therefore, future research should be performed
to further investigate this finding. Future studies investigating more advanced transfer
learning strategies, such as domain adaptation, may also lead to better results.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen
