Text Classification - Classifying events to ugenda calendar genres
Keywords
Loading...
Authors
Issue Date
2016-07-18
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Ugenda is a leading cultural event website that faces a challenge in
the information management of their event calendar. The process of
verifying and preparing information from venues is time-consuming
and the team is looking for a way to automate this process. Certain
event details are often missing, such as the event genres. These are
important for sorting the calendar. This thesis proposes a solution
for automatically labeling events that lack a genre. The focus is on
three subjects; event details, pre-processing techniques and classification
methods. We try to find a combination that works well enough for
an operating website. The pre-processing methods included natural
language processing, HTML tag removal, date, time and location feature
mapping. The four classifiers used were support vector machines, logistic
regression, naïve bayes and random forest. Results show that the logistic
regression classifier has the best performance with a complete setup
of proposed pre-precessing methods and event details. An F1-score of
0.8110 was achieved, which is not enough for an operating website.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen