Graph-based semi-supervised learning of semantic text clusters
Keywords
Loading...
Authors
Issue Date
2017-05-15
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
The bag-of-words model is a common approach to represent documents for all kind of text mining tasks. However, the assumed independence of words does not reflect the complexity and context of human natural language. We propose a graph-based representation of collections of documents that include documents and features with their respective syntactic, semantic and frequency-based relations.
Based on semi-supervised learning - an approach that besides using labeled data, also incorporates the structure of unlabeled data for classifier training - the influence of different graph properties on text categorization is investigated. The results show that even though bag-of-words is a powerful approach, adding word relations significantly improves classification performance. Whether syntactic or semantic feature relations are used has, however, no significant influence.
Although, graph-based semi-supervised learning outperforms bag-of-words based supervised and semi-supervised learning approaches when varying the number of labeled documents, it is not able to use the full potential of including unlabeled data.
The big advantage of graph-based methods is their flexibility to perfectly adapt the document representation to a specific text mining task.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen