Text Classification - Classifying events to ugenda calendar genres

Keywords

Loading...
Thumbnail Image

Authors

Issue Date

2016-07-18

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

Ugenda is a leading cultural event website that faces a challenge in the information management of their event calendar. The process of verifying and preparing information from venues is time-consuming and the team is looking for a way to automate this process. Certain event details are often missing, such as the event genres. These are important for sorting the calendar. This thesis proposes a solution for automatically labeling events that lack a genre. The focus is on three subjects; event details, pre-processing techniques and classification methods. We try to find a combination that works well enough for an operating website. The pre-processing methods included natural language processing, HTML tag removal, date, time and location feature mapping. The four classifiers used were support vector machines, logistic regression, naïve bayes and random forest. Results show that the logistic regression classifier has the best performance with a complete setup of proposed pre-precessing methods and event details. An F1-score of 0.8110 was achieved, which is not enough for an operating website.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen