Automatic extraction of characterizing features for non-native Dutch read speech
Keywords
Loading...
Authors
Issue Date
2022-01-25
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Although finding characteristic features for English atypical speech has been the topic for
many researches, not much research has been done for atypical Dutch speech. In 2008, the
JASMIN speech corpus was completed, a spoken Dutch corpus containing children, elderly and
non-native Dutch speakers [1]. In this thesis, non-native Dutch read speech from JASMIN is
compared to native read speech to find out which features are characteristic for non-native
Dutch speech. By automatically computing 103 word level Praat and eGeMAPS features from
speech recordings and transcriptions, ranking these features with a Recursive Feature Elimination
(RFE) method, classifying them with binary comparisons using a Support Vector Machine
(SVM), and finally evaluating them using statistical tests, this research succeeded in automatic
extraction of characteristic features for non-native Dutch read speech. Through binary comparisons
with native speech, 93 out of 103 features were found to be significantly different.
Two characteristic and partly overlapping sets of features were found; the first set based on
the RFE ranking, the second based on an individual effect size ranking. Both sets support the
hypotheses that a lower speaker volume and lower order Mel-Frequency-Cepstral-Coefficients
are characteristic for non-native Dutch speech, and show indications of a slower reading pace
for non-natives. Moreover, formant related features were prevalent in both rankings, indicating
a different shape of the vocal tract owing to deviations in non-native pronunciation compared
to native speakers.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen