Multi-Label Classification of Movie Genres using Text-based Features and WordNet Hypernyms

Keywords
Loading...
Thumbnail Image
Issue Date
2010-06-18
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Text categorization techniques have become increasingly more important in the past decade. Whereas many approaches rely on video or audio features for classifying digital media, text-based features provide a considerable amount of information and are computationally inexpensive to process. In this thesis we present a large movie subtitle database of data in natural language, which will be used to predict genre labels in a multi-label classification problem. We provide methods to extract text-based features and reduce attribute dimensionality effectively. We also demonstrate the generation of a second dataset using WordNet, where all words from the original subtitles are replaced by their direct hypernyms. A final distinction is made within datasets to include TF-IDF-transformations or not. We hypothesize that the dataset containing hypernyms will outperform the original dataset of textbased features. Furthermore, we hypothesize that TF-IDF-transformation has a positive effect on classification accuracy. A selection of multi-label classification techniques were tested on their performance using the four conditions. Results show very good scores on classification performance but no significant difference between the four experimental conditions.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen