Computer Vision Nanodegree

This repository contains my exercises and projects for the Computer Vision Nanodegree.

Project 1: Facial Keypoint Detection

In this project, I defined and trained a convolutional neural network to perform facial keypoint detection. The complete pipeline includes:

Detect all the faces in an image using a Haar Cascade detector.
Pre-process those face images so that they are grayscale, and transformed to a Tensor of the expected input size.
Use the trained model to detect facial keypoints on the image.

Training and Testing Data

This facial keypoints dataset consists of 5770 color images, which has been extracted from the YouTube Faces Dataset and includes videos of people in YouTube videos. These videos have been fed through some processing steps and turned into sets of image frames containing one face and the associated keypoints.

3462 of these images are training images, for you to use as you create a model to predict keypoints.
2308 are test images to be used to test the accuracy of your model.

Results

Project 2: Image Captioning Project

In this project, I designed and trained a CNN-RNN model for automatically generating image captions. The training dataset is Microsoft Common Objects in COntext (MS COCO) dataset, which is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms. The modal uses a CNN as an image “encoder”, by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. It generates complete sentences in natural language from an input image, as shown on the example below.

Results

Images with relatively accurate captions:

Images with relatively inaccurate captions:

Project 3: Landmark Detection & Robot Tracking (SLAM)

Project Overview

In this project, I implemented SLAM (Simultaneous Localization and Mapping) for a 2 dimensional world. I combined robot sensor measurements and movement to create a map of an environment from only sensor and motion data gathered by a robot, over time. SLAM gives a way to track the location of a robot in the world in real-time and identify the locations of landmarks such as buildings, trees, rocks, and other world features. This is an active area of research in the fields of robotics and autonomous systems.

Results

The final position of the robot is (24.66580405687, 82.86809855648303).

Below is the 2D robot world with landmarks (purple x's) and the robot (a red 'o') located and found using only sensor and motion data collected by that robot.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
CVND_Exercises		CVND_Exercises
CVND_Localization_Exercises		CVND_Localization_Exercises
P1_Facial_Keypoints		P1_Facial_Keypoints
P2_Image_Captioning		P2_Image_Captioning
P3_Implement_SLAM		P3_Implement_SLAM
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer Vision Nanodegree

Project 1: Facial Keypoint Detection

Training and Testing Data

Results

Project 2: Image Captioning Project

Results

Project 3: Landmark Detection & Robot Tracking (SLAM)

Project Overview

Results

About

Releases

Packages

Languages

jshangguan/Computer_Vision_ND

Folders and files

Latest commit

History

Repository files navigation

Computer Vision Nanodegree

Project 1: Facial Keypoint Detection

Training and Testing Data

Results

Project 2: Image Captioning Project

Results

Project 3: Landmark Detection & Robot Tracking (SLAM)

Project Overview

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages