Completely Unsupervised Phone Segmentation, Clustering and Classification
Keywords
Loading...
Authors
Issue Date
2022-09-01
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Automatic speech recognition (ASR) often relies at least partly on
supervised learning. However, labelled data sets are not available for all
tasks. For many types of ASR training, the availability of phone-level
annotation of audio would be of substantial help. This paper presents and in
part implements a completely unsupervised network for segmenting and
classifying phones from a speech signal. A novel unsupervised phone
segmentation method based on clustering is presented and achieved an
accuracy of 80%. This paper also presents different clustering methods, and
the results are discussed, looking at both individual phones and broad
phone groups. We look at the advantages and disadvantages of using a
clustering algorithm as a pre-step to unsupervised classification. We discuss
the performance in detail and look at the effect of phone context. An
unsupervised method to classifying the resulting cluster indices to phone
labels is discussed. This method uses a generative adversarial network
(GAN) that works without using parallel data. Finally, we discuss most
recent advancements in GANs that can potentially be of use in this
classification problem.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen