The Influence of Features in Emotion Speech Recognition and its Relation to Auditory Emotion Recognition in the Brain.
Keywords
Loading...
Authors
Issue Date
2023-03-02
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Speech emotion recognition is a challenging problem because it is not clear which features are effective for classification. In this work there will be focused on two approaches looking at the feature extraction problem, one focusing on the neuroscience side by looking into which features are used in the brain to recognize emotions. Auditory emotion classification is constructed using the available information in the auditory stimuli, the main perceptual features of auditory stimuli are: loudness, pitch and timbre. The other approach is looking further into a pre-trained convolutional neural network and the importance of its features. These features are analysed using the samples from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The features are mel-frequency cepstral coefficients, chromagram, mel-scale spectrogram, Tonnetz representation, and spectral contrast. We modify our initial model by removing one of the features in order to tell the effect on the classification accuracy. Based on experimental results MFCC’s came out as one of the most important features to consider when classifying emotion with a drop of 29% in performance accuracy. By removing the Tonnetz representation the classification accuracy improved by 3.9%. Finally there is looked into how features from speech emotion recognition relate with auditory emotion recognitio . The outcome is that both processes emphasizen features correlated with frequency.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen