Skip to content

Computer Vision and NLP techniques are combined (CNN & LSTM) to understand the content of an image and generate a coherent and relevant description

Notifications You must be signed in to change notification settings

YoussefAboelwafa/Image-Caption-Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

Image Caption Generator

Image captioning is the process of generating textual descriptions for images automatically. It combines computer vision and natural language processing techniques to understand the content of an image and generate a coherent and relevant description.

CNN & RNN (LSTM)

To perform Image Captioning we will require two deep learning models combined into one for the training purpose.
CNNs extract the features from the image of some vector size aka the vector embeddings. The size of these embeddings depend on the type of pretrained network being used for the feature extraction LSTMs are used for the text generation process. The image embeddings are concatenated with the word embeddings and passed to the LSTM to generate the next word

image

Dataset

A new benchmark collection for sentence-based image description and search, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the entities and events.
The images don't contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations.

Kaggle: Flickr 8k Dataset
Hugging Face: Flickr 8k Dataset


Notebook

Link to Kaggle notebook: Image-Caption-Generator_CNN-LSTM (PyTorch)


Caption Generation (Good Examples)

output_37_3
output_39_2
output_39_9
output_37_1


Caption Generation (Bad Examples)

output_37_2
output_39_8
output_39_7

About

Computer Vision and NLP techniques are combined (CNN & LSTM) to understand the content of an image and generate a coherent and relevant description

Topics

Resources

Stars

Watchers

Forks