Text Classification - Classifying events to ugenda calendar genres

Keywords
Thumbnail Image
Authors
Date
2016-07-18
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Ugenda is a leading cultural event website that faces a challenge in the information management of their event calendar. The process of verifying and preparing information from venues is time-consuming and the team is looking for a way to automate this process. Certain event details are often missing, such as the event genres. These are important for sorting the calendar. This thesis proposes a solution for automatically labeling events that lack a genre. The focus is on three subjects; event details, pre-processing techniques and classification methods. We try to find a combination that works well enough for an operating website. The pre-processing methods included natural language processing, HTML tag removal, date, time and location feature mapping. The four classifiers used were support vector machines, logistic regression, naïve bayes and random forest. Results show that the logistic regression classifier has the best performance with a complete setup of proposed pre-precessing methods and event details. An F1-score of 0.8110 was achieved, which is not enough for an operating website.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen