Automatic speech recognition: An approach on accent identification using HTK

Keywords
No Thumbnail Available
Issue Date
2017-06-22
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Identifying accents can become largely difficult. Accents of a certain language vary for each country and region. Speech recognition systems have to face this problem as well as humans do. Identifying the users' accent would make an automatic speech recognition (ASR) system more robust and perform better than not knowing the accent. In this study, it is investigated if it is possible to build an ASR system that can reliably distinguish between English and Dutch native speakers pronouncing English words in a consonant-vowel-consonant structure. In order to answer that, three different approaches of ASR systems (models) are created and tested in four scenarios: The first model is trained on exclusively English data and tested on either (1) English data or (2) Dutch data. The second model is trained on both nationalities and tested on Dutch data (3). The last model is trained on both nationalities, but treating every Dutch word differently from the English equivalent word in its label, and tested on Dutch data. Model one archived an accuracy of 90,60% and 55,56% in scenario (1) and (2) respectively. The second model archived an accuracy of 64,36% in scenario (3). Finally, model three reached an accuracy of 70,69% in scenario (4). Since model three performed significantly better than model two (p<0.05) and model two performed significantly better than model one in scenario (2) (p<0.05), we can conclude that model three can distinguish between Dutch and English speakers. Besides, this study takes a look at the raw data by extracting the frequencies of the vowels in the data to compare them between Dutch and English native speakers. It appears that English native speakers pronounce the /e/ vowel shorter than Dutch native speakers (0,1427s and 0,2394s respectively, p<0,001). Moreover, one can see that the groups of Dutch-male, Dutch-female, English-male and English-female are clustered separately if the F1 and F2 frequencies are plotted against each other. This information might be used for the classifier to classify the nationality of the data.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen