Speech recognisers compared on American English sentences: A research on phone transcriptions in TIMIT

Keywords

Loading...
Thumbnail Image

Issue Date

2024-08-30

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

This research examines explicitly phonetic classification using speech recognisers, a “large” model of wav2vec2.0, and a “small” model of Open AI Whisper. The research method uses an ARPABET phonetic alphabet transcription of WAV files of the TIMIT dataset. The automatic speech recognition (ASR) models used for this research are pretrained on the Librispeech dataset by the authors of the models. The fine-tuned wav2vec2 model was pre-trained on up to 960 hours of labelled speech data, and the Whisper model on 680,000 hours of speech data (Baevski et al., 2020 and Radford et al., 2022)

Description

Citation

Faculty

Faculteit der Letteren