This notebook demonstrates an end-to-end workflow for sign language action detection using computer vision and deep learning.
The key steps in the tutorial are:
-
Install and Import Dependencies - Setting up the necessary Python libraries
-
Detect Face, Hand and Pose Landmarks - Using mediapipe to detect keypoints from video frames
-
Extract Keypoints - Retrieving the (x, y) coordinates of the detected keypoints
-
Setup Folders for Data Collection - Creating directories to store training data
-
Collect Keypoint Sequences - Capturing video frames and saving keypoint coordinate sequences
-
Preprocess Data and Create Labels - Encoding the sequences, standardizing data and generating one-hot labels
-
Build and Train an LSTM Deep Learning Model - Developing and training an LSTM network for classification
-
Make Sign Language Predictions - Using the trained model to predict actions on new data
-
Save Model Weights - Storing the trained model parameters
-
Evaluation using a Confusion Matrix - Analyzing model performance with a confusion matrix
-
Test in Real Time - Demonstrating real-time prediction on live video input
By walking through these steps, the notebook provides an end-to-end demonstration of developing a deep learning model for sign language action detection from video sequences.
- tensorflow
- opencv-python
- mediapipe
- scikit-learn
- matplotlib
If you want to train the model on your data, you can either do it using the live feed and performing an action 30 times or you can use data such as INCLUDE (AI4Barath) to train on video data just by doing tiny changes in the open-cv code
Once done you can save your model and create your own instance!