Organizing Flickr30k Using Text Clustering
Organizing Flickr30k Using Text Clustering
Keywords
No Thumbnail Available
Authors
Date
2018-06-18
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Text clustering is the process of clustering similar documents together based on the textual
information within a document. The captions provided with the Flickr30k dataset
will be used to organize the images. The dataset consists of captioned images of everyday
life. The two approaches to clustering (hierarchical and partitional) will be
implemented to assess the formed clusters. K-means and agglomerative clustering will
be used to experiment with. The performance of the two algorithms will be assessed
using internal validity measurements. The difference between the two algorithms was
too small to judge which one performed better. However the clusters that are formed
did differ. K-means made a distinction between ‘adult people’ vs. ‘young people’,
agglomerative clustering made a distinction between ‘people’ vs. ‘bullfighting’.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen