The Influence of Features in Emotion Speech Recognition and its Relation to Auditory Emotion Recognition in the Brain.

Keywords

Loading...
Thumbnail Image

Issue Date

2023-03-02

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

Speech emotion recognition is a challenging problem because it is not clear which features are effective for classification. In this work there will be focused on two approaches looking at the feature extraction problem, one focusing on the neuroscience side by looking into which features are used in the brain to recognize emotions. Auditory emotion classification is constructed using the available information in the auditory stimuli, the main perceptual features of auditory stimuli are: loudness, pitch and timbre. The other approach is looking further into a pre-trained convolutional neural network and the importance of its features. These features are analysed using the samples from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The features are mel-frequency cepstral coefficients, chromagram, mel-scale spectrogram, Tonnetz representation, and spectral contrast. We modify our initial model by removing one of the features in order to tell the effect on the classification accuracy. Based on experimental results MFCC’s came out as one of the most important features to consider when classifying emotion with a drop of 29% in performance accuracy. By removing the Tonnetz representation the classification accuracy improved by 3.9%. Finally there is looked into how features from speech emotion recognition relate with auditory emotion recognitio . The outcome is that both processes emphasizen features correlated with frequency.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen