Multi-Label Classification of Movie Genres using Text-based Features and WordNet Hypernyms
Keywords
Loading...
Authors
Issue Date
2010-06-18
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Text categorization techniques have become increasingly more important in
the past decade. Whereas many approaches rely on video or audio features
for classifying digital media, text-based features provide a considerable
amount of information and are computationally inexpensive to process. In
this thesis we present a large movie subtitle database of data in natural language,
which will be used to predict genre labels in a multi-label classification
problem. We provide methods to extract text-based features and reduce attribute
dimensionality effectively. We also demonstrate the generation of a
second dataset using WordNet, where all words from the original subtitles
are replaced by their direct hypernyms. A final distinction is made within
datasets to include TF-IDF-transformations or not. We hypothesize that the
dataset containing hypernyms will outperform the original dataset of textbased
features. Furthermore, we hypothesize that TF-IDF-transformation has
a positive effect on classification accuracy. A selection of multi-label classification
techniques were tested on their performance using the four conditions.
Results show very good scores on classification performance but no significant
difference between the four experimental conditions.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen