Completely Unsupervised Phone Segmentation, Clustering and Classification

Keywords

Loading...
Thumbnail Image

Issue Date

2022-09-01

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

Automatic speech recognition (ASR) often relies at least partly on supervised learning. However, labelled data sets are not available for all tasks. For many types of ASR training, the availability of phone-level annotation of audio would be of substantial help. This paper presents and in part implements a completely unsupervised network for segmenting and classifying phones from a speech signal. A novel unsupervised phone segmentation method based on clustering is presented and achieved an accuracy of 80%. This paper also presents different clustering methods, and the results are discussed, looking at both individual phones and broad phone groups. We look at the advantages and disadvantages of using a clustering algorithm as a pre-step to unsupervised classification. We discuss the performance in detail and look at the effect of phone context. An unsupervised method to classifying the resulting cluster indices to phone labels is discussed. This method uses a generative adversarial network (GAN) that works without using parallel data. Finally, we discuss most recent advancements in GANs that can potentially be of use in this classification problem.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen