Multi-Label Classification of Movie Genres using Text-based Features and WordNet Hypernyms

Keywords

Loading...
Thumbnail Image

Issue Date

2010-06-18

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

Text categorization techniques have become increasingly more important in the past decade. Whereas many approaches rely on video or audio features for classifying digital media, text-based features provide a considerable amount of information and are computationally inexpensive to process. In this thesis we present a large movie subtitle database of data in natural language, which will be used to predict genre labels in a multi-label classification problem. We provide methods to extract text-based features and reduce attribute dimensionality effectively. We also demonstrate the generation of a second dataset using WordNet, where all words from the original subtitles are replaced by their direct hypernyms. A final distinction is made within datasets to include TF-IDF-transformations or not. We hypothesize that the dataset containing hypernyms will outperform the original dataset of textbased features. Furthermore, we hypothesize that TF-IDF-transformation has a positive effect on classification accuracy. A selection of multi-label classification techniques were tested on their performance using the four conditions. Results show very good scores on classification performance but no significant difference between the four experimental conditions.

Description

Citation

Supervisor

Faculty

Faculteit der Sociale Wetenschappen