Automatic Subtitle Generation for Dutch TV Content
Keywords
Loading...
Authors
Issue Date
2022-05-03
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Subtitles are a necessary medium of communication for those who are hearing impaired. To develop
methods to more easily create these subtitles, this study investigates the relatively unexplored eld of automatically
generating subtitles for Dutch TV content. We study and implement three modules: speech recognition, punctuation
restoration, and subtitle segmentation, which together form a pipeline for the automatic generation of subtitles. We
implement, optimize, and evaluate the state of the art for these individual modules to provide a clear overview of available
techniques and their performance. To realize this, a representative, labelled speech dataset of extracted fragments from
a Dutch TV show was created, alongside with multiple subtitle-based datasets and language models. The pipeline
consisting of the best performing models for each module is implemented and evaluated by human annotators. Our
contribution is a full-
edged pipeline to automatically create subtitles for Dutch TV content based on open source
models, as well as a framework to stimulate further research on the individual modules and subtitle generation in
general.
Keywords: Automatic Speech Recognition, Subtitle Generation, Punctuation Restoration, Subtitle Segmentation
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen