Skip to content

Projects of Udacity Computer Vision Nanodegree, project 1 - facial keypoint detection, project 2 - Image Captioning, project 3 - SLAM and extra curricular project on code optimization

Notifications You must be signed in to change notification settings

Tandon-A/CVND_Udacity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CVND_Udacity

Projects of Udacity Computer Vision Nanodegree

This project aims to build a CNN model to detect facial keypoints in an image which are the points of 'interest' in a human face such as the corners of eyes and mouth.

The detection of facial keypoints allows building facial image manipulation applications.

Keypoint1 Keypoint2

Fig 1: Predicted facial Keypoints

Manipulation

Fig 2: Sample image manipulation using facial keypoints

The goal of this project is to develop a deep learning model to generate captions for images. This is done using a CNN - RNN architecture following the paper Show and Tell.

CNN-RNN model

Image captioning can be used to provide verbal descriptions to partially/complete visually impaired people through a headset. It can also be used to build a query based image search engine without the need of manually annotated images.

Some sample captions generated by the trained model are shown below.

Caption 1 Caption 2 Caption 3

Fig 4: Generated Image Captions

The goal of this project is to do landmark detection and tracking by using simultaneous localization and mapping (SLAM) for a 2D world. For this, I have implemented graphSLAM.

SLAM Image

Fig 5: Final location of the robot found using SLAM

Using the robot's sensor measurements, SLAM predicts the position of the robot and the landmarks in the world. Localizing the robot in real-time builds a map of the environment.

The goal of this project is to optimize the C++ code of the 2D histogram filter. Code optimizations reduce the execution time of a program while also reducing the memory footprint, making it feasible to run the code on an embedded device or in real-time scenarios.

Execution time (in milliseconds) of the code is monitored by running every function for 10000 iterations. The best execution time achieved by the code is 16.877 milliseconds.

File Name Original Problem Code execution time Optimized Code execution time Optimized Code execution with O3 GCC flag execution time
Initialize Beliefs 43.42 13.518 1.802
Sense 56.057 14.967 3.444
Blur 151.49 67.38 7.748
Normalize 56.39 13.157 1.573
Move 51.566 16.536 2.31
Total 358.923 125.558 16.877

Acknowledgement

Udacity Computer Vision Nanodegree

Author

Abhishek Tandon

About

Projects of Udacity Computer Vision Nanodegree, project 1 - facial keypoint detection, project 2 - Image Captioning, project 3 - SLAM and extra curricular project on code optimization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages