Context-aware multimodal Recurrent Neural Network for automatic image captioning
Keywords
Loading...
Authors
Issue Date
2016-08-25
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Automatic image captioning is a state-of-the-art computer vision task where any image can
be described with text. There are cases where an image is supported by text in for instance
books or news articles. In this study a context-aware model is proposed that uses not only
the image, but also the text surrounding the image to generate a description. The model
uses a joint LSTM with attention on both the image and the context and is trained on the
Microsoft COCO dataset. This study also explored several setups to represent the text into
a feature vector. Results show quantitative and qualitative improvements when context is
included. Future directions are automating the feature crafting as well as applying the model
to more datasets.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen