This is an implementation of the paper "Show and Tell: A Neural Image Caption Generator", by Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Full text is available at: https://arxiv.org/abs/1411.4555
The Dataset used is Flickr8k. The data can be requested here. An email will be sent to the mentioned Email Address, consisting of images in Flickr8K_Data and the text data in Flickr8K_Text in the zip format which needs to be upzipped. Used Keras with Tensorflow backend for this code.