Abstract:
A novel approach for expanding documents is proposed to improve topic modeling on short
text. The enrichment is based on expanding noun words with information from custom (e.g.
domain-speci c) and pretrained Word2Vec models. The quality of the di erent conditions:
original, custom and pretrained, are evaluated with manual analysis of the created topics and
with the classi cation performance of a Suport Vector Machine trained on the output of an
LDA system. Manual analysis did not show a striking improvement of the created topics with
the enriched texts, compared to the original text. The performance of the prediction models
show a improved performance, only when enriched with information from the custom Word2Vec
models. However, the extent of the improvement is dependent on the text domain.