In the project, we implemented a Neural Network for Image Captioning. In this model, we give an image to the network and generate sentences that describe the image. This model implements a CNN for identifying high-level image features and an LSTM for building sentences. We use the Flickr8k dataset to train our image captioning network. It includes 8091 RGB images and 40455 sentences, five sentences for each image gathered from different persons.
Flickr8k dataset: https://www.kaggle.com/adityajn105/flickr8k