Transfer Learning in Emotion Recognition of the Singing vs Speaking Voice

Keywords

Loading...
Thumbnail Image

Issue Date

2023-01-25

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

Emotion recognition can be a very beneficial tool for for example voice assistants or music applications. However, the datasets available for song and speech emotion recognition are small, which can be problematic as models generally perform better when trained on more data. Therefore, transfer learning was proposed by this paper to circumvent this problem as the model is then trained on data from a domain that is different, but similar, to the testing domain. Because of this, there is more training data available. Transfer learning was also used to confirm whether song and speech emotion recognition are generalizable, both in the human brain and computational models. With the use of literature research, it was uncovered that the two types of emotion recognition are generalizable in the human brain as they overlap in terms of brain areas active and levels of acoustic parameters. With the use of an existing convolutional neural network, however, it was uncovered that a direct transfer with the chosen model was not an effective method for song and speech emotion recognition as it worsened the performance. Training the model on a mix of song and speech data did lead to a similar performance as without transfer learning and enlarged the size of the training set. Therefore, future research should be performed to further investigate this finding. Future studies investigating more advanced transfer learning strategies, such as domain adaptation, may also lead to better results.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen