Word Level Sentence Generation using Deep Learning for Indian Sign Language

This project belongs to my Master's major work guided by professor Dr. Rathna G N.

-- Project Timeline: [August 2021] - [June 2022]

Abstract

Indian Sign Language refers to the language used by the speech impaired (deaf-dumb) population to communicate in the Indian Subcontinent. However, it becomes difficult for the general population to communicate well with the speech-impaired population primarily due to a lack of knowledge of sign language. So to bridge this gap of communication between the general and the speech-impaired people, in this project work, we have tried to recognize the Indian Sign Language word-by-word and thereby generate entire sentences. This project has focused on one-way communication, where the general people can understand speech impaired people with the help of our state-of-the-art models. We have attempted a three-step approach. In the first step, key points extraction of face, pose, left hand, and right hand is done from the video frames. Then in the second step, the extracted key points are given to the deep learning model such as LSTM or Transformer's Encoder to recognize which sign language (i.e., word) the key points belong. And in the final step, the recognized words are appended to the list to generate the desired sentence.

Built With

Python v3.7.10 or above
Numpy, Pandas
PyTorch v1.9.0 or above
CUDA 10.2

Dataset

The Indian Sign Language dataset we created is replicated from the freely available ISLRTC New Delhi YouTube page. The playlist found on the youtube channel consists of more than four thousand words and their associated gestures. Since there happened to be only one sample video consisting of sign language for each word, it became a need to generate a video on our own for each word so that we can create more dataset which is ideally suitable for data-hungry deep learning models. We made 50 videos per 20 gestures. Each video has 20 frames of sequential data, i.e., sign language. Then with the help of the mediapipe package, we extracted keypoints associated with each gesture per frame. We extracted keypoints from the face, pose, left hand, and right hand for each frame. A total of 1662 keypoints are extracted per frame. That is, we get a tensor of shape 20x1662 per gesture (or per video). We then feed this sequential data with a sequence length of 20 to the sequential model to recognize that particular video.

We also created the test dataset separately, but with different conditions, i.e., we created half the dataset in dim light conditions, and the remaining half was created in normal light conditions. We made 10 videos for each gestures. A total of 200 videos were made for testing purpose.

Model Architectures

We trained the dataset on following two different architectures.

LSTM-RNN Classifier
Transformer-Encoder Classifier

After training on both these architectures it was found that Transformer-Encoder classifier is giving much better results as compared to LSTM-RNN classifier.

The architecture diagram of Transformer-Encoder Classifier is show in the below figure.

Results obtained on test dataset using Transformer-Encoder architecture

Classification report -

Confusion matrix -

It can be seen from the above images that the worst predicted classes are "no" and "see", whereas the highly predicted classes are "hat", "idea", "I".

We also got the 70% accuracy on the test dataset using Transformer-Encoder, which is a huge improvement over 58% from LSTM-RNN model.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ISL_BiLSTM.py		ISL_BiLSTM.py
ISL_LSTM.py		ISL_LSTM.py
ISL_Transformer.py		ISL_Transformer.py
ISL_add_gestures.py		ISL_add_gestures.py
ISL_generate_data_folders.py		ISL_generate_data_folders.py
ISL_generate_keypts.py		ISL_generate_keypts.py
ISL_grid.py		ISL_grid.py
ISL_inference.py		ISL_inference.py
ISL_params.py		ISL_params.py
ISL_preprocess_keypts.py		ISL_preprocess_keypts.py
ISL_test.py		ISL_test.py
ISL_train.py		ISL_train.py
ISL_utils.py		ISL_utils.py
README.md		README.md
Report.pdf		Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Level Sentence Generation using Deep Learning for Indian Sign Language

-- Project Timeline: [August 2021] - [June 2022]

Abstract

Built With

Dataset

Model Architectures

Results obtained on test dataset using Transformer-Encoder architecture

Real Time Testing

Our Demo Video of sentence generation:

Demo Video of our model tested on ISLRTC New Delhi Data

Authors

About

Releases

Packages

Languages

palash04/IndianSignLanguage

Folders and files

Latest commit

History

Repository files navigation

Word Level Sentence Generation using Deep Learning for Indian Sign Language

-- Project Timeline: [August 2021] - [June 2022]

Abstract

Built With

Dataset

Model Architectures

Results obtained on test dataset using Transformer-Encoder architecture

Real Time Testing

Our Demo Video of sentence generation:

Demo Video of our model tested on ISLRTC New Delhi Data

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages