Graph-based semi-supervised learning of semantic text clusters

dc.contributor.advisorFarquhar, J.D.R.
dc.contributor.advisorVerberne, S.
dc.contributor.authorWidmann, N.
dc.date.issued2017-05-15
dc.description.abstractThe bag-of-words model is a common approach to represent documents for all kind of text mining tasks. However, the assumed independence of words does not reflect the complexity and context of human natural language. We propose a graph-based representation of collections of documents that include documents and features with their respective syntactic, semantic and frequency-based relations. Based on semi-supervised learning - an approach that besides using labeled data, also incorporates the structure of unlabeled data for classifier training - the influence of different graph properties on text categorization is investigated. The results show that even though bag-of-words is a powerful approach, adding word relations significantly improves classification performance. Whether syntactic or semantic feature relations are used has, however, no significant influence. Although, graph-based semi-supervised learning outperforms bag-of-words based supervised and semi-supervised learning approaches when varying the number of labeled documents, it is not able to use the full potential of including unlabeled data. The big advantage of graph-based methods is their flexibility to perfectly adapt the document representation to a specific text mining task.en_US
dc.identifier.urihttp://theses.ubn.ru.nl/handle/123456789/5238
dc.language.isoenen_US
dc.thesis.facultyFaculteit der Sociale Wetenschappenen_US
dc.thesis.specialisationMaster Artificial Intelligenceen_US
dc.thesis.studyprogrammeArtificial Intelligenceen_US
dc.thesis.typeMasteren_US
dc.titleGraph-based semi-supervised learning of semantic text clustersen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Widmann, N._MSc_Thesis_2017.pdf
Size:
1.08 MB
Format:
Adobe Portable Document Format
Description:
Thesis text