Topic Modeling with Word2Vec based Noun Expansion for Dark Web Marketplace Analysis

Keywords

No Thumbnail Available

Issue Date

2019-10-11

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

A novel approach for expanding documents is proposed to improve topic modeling on short text. The enrichment is based on expanding noun words with information from custom (e.g. domain-speci c) and pretrained Word2Vec models. The quality of the di erent conditions: original, custom and pretrained, are evaluated with manual analysis of the created topics and with the classi cation performance of a Suport Vector Machine trained on the output of an LDA system. Manual analysis did not show a striking improvement of the created topics with the enriched texts, compared to the original text. The performance of the prediction models show a improved performance, only when enriched with information from the custom Word2Vec models. However, the extent of the improvement is dependent on the text domain.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen