Speech recognisers compared on American English sentences: A research on phone transcriptions in TIMIT
Keywords
Loading...
Authors
Issue Date
2024-08-30
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
This research examines explicitly phonetic classification using speech recognisers, a “large” model of wav2vec2.0, and a “small” model of Open AI Whisper. The research method uses an ARPABET phonetic alphabet transcription of WAV files of the TIMIT dataset. The automatic speech recognition (ASR) models used for this research are pretrained on the Librispeech dataset by the authors of the models. The fine-tuned wav2vec2 model was pre-trained on up to 960 hours of labelled speech data, and the Whisper model on 680,000 hours of speech data (Baevski et al., 2020 and Radford et al., 2022)
Description
Citation
Supervisor
Faculty
Faculteit der Letteren
