Wav2vec 2.0: Contrastive Analyses in end-to-end Transformer Models

Keywords

Loading...
Thumbnail Image

Issue Date

2023-10-25

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

End-to-end ASR systems with a transformer architecture, such as wav2vec 2.0, have become widely available as a means of automatic speech transcription. These systems take speech, in the form of audio files, as input, and convert this to written text, usually in the form of grapheme sequences. Converting continuous audio to discrete text means the system needs to deal with this transition. We research this transition by investigating the transformer block in an ASR system, wav2vec 2.0, using probing classifiers, in which we detect the gradual conversion by the system using probability vectors and information storage in individual layers of the transformer block. Through these insights, we learn how the system deals with such a transition, and we visualize how grapheme activation differs across the ASR system.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen