Automatic speech recognition: An approach on accent identification using HTK
Keywords
No Thumbnail Available
Authors
Issue Date
2017-06-22
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Identifying accents can become largely difficult. Accents of a certain language vary for
each country and region. Speech recognition systems have to face this problem as well
as humans do. Identifying the users' accent would make an automatic speech
recognition (ASR) system more robust and perform better than not knowing the accent.
In this study, it is investigated if it is possible to build an ASR system that can reliably
distinguish between English and Dutch native speakers pronouncing English words in a
consonant-vowel-consonant structure. In order to answer that, three different
approaches of ASR systems (models) are created and tested in four scenarios: The first
model is trained on exclusively English data and tested on either (1) English data or (2)
Dutch data. The second model is trained on both nationalities and tested on Dutch data
(3). The last model is trained on both nationalities, but treating every Dutch word
differently from the English equivalent word in its label, and tested on Dutch data. Model
one archived an accuracy of 90,60% and 55,56% in scenario (1) and (2) respectively.
The second model archived an accuracy of 64,36% in scenario (3). Finally, model three
reached an accuracy of 70,69% in scenario (4). Since model three performed
significantly better than model two (p<0.05) and model two performed significantly better than model one in scenario (2) (p<0.05), we can conclude that model three can
distinguish between Dutch and English speakers. Besides, this study takes a look at the raw data by extracting the frequencies of the vowels in the data to compare them between Dutch and English native speakers. It appears that English native speakers pronounce the /e/ vowel shorter than Dutch native speakers (0,1427s and 0,2394s respectively, p<0,001). Moreover, one can see that the groups of Dutch-male, Dutch-female, English-male and English-female are clustered separately if the F1 and F2 frequencies are plotted against each other. This information might be used for the classifier to classify the nationality of the data.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen