A real-time hand gesture recognition system using computer vision and deep learning. This project detects and classifies various hand gestures through your webcam with MediaPipe and PyTorch.
The Hand Gesture Recognition system provides an intuitive interface for detecting and classifying hand gestures in real-time. It uses MediaPipe for hand landmark detection and a custom-trained PyTorch CNN model for gesture classification.
This code was developed by:
- Real-time Processing: Detect and classify hand gestures instantly from webcam input
- MediaPipe Integration: Precise hand landmark detection and tracking
- Custom CNN Model: Trained neural network for accurate gesture classification
- Visualization Tools: Real-time confidence scores and probability distribution
- Extensible Design: Easy addition of new gestures through custom training
- Data Augmentation: Tool for expanding training dataset with mirrored images
- Interactive Interface: Simple keyboard controls for capture and training modes
The system can recognize the following gestures (and more can be added through training):
- Python 3.11(?) - 3.12 (not compatible with Python 3.10, possibly 3.13)
- Webcam
- CUDA-capable GPU (optional, for faster training)
- Required packages listed in
requirements.txt
-
Clone the repository:
git clone https://github.com/SirAlexiner/HandGestureRecognition.git cd HandGestureRecognition
-
Create and activate a virtual environment (recommended):
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install the required packages:
pip install -r requirements.txt
-
Download the pre-trained model (or train your own):
# Download from this repo # Or follow the training instructions below
To start the hand gesture recognition system:
python main.py
- Press 'c': Toggle capture mode for collecting training data for new gestures
- Press 'q': Quit the application
The application provides real-time visualization with:
- Hand landmark detection with wireframe overlay
- Bounding box around detected hand
- Gesture classification result with confidence score
- Probability distribution chart for all gestures
- Small hand ROI preview in the corner
The interface displays:
- Main camera view with hand detection
- ROI (Region of Interest) extracted around your hand
- Recognition result with confidence level
- Probability chart for all gesture classes
-
Run the main application:
python main.py
-
Press 'c' to enter capture mode
-
Enter the name of the gesture you want to capture
-
Make the gesture in front of your webcam
-
Press 'c' again to stop capturing
-
Repeat for different gestures
The captured images will be saved in the data/custom/[gesture_name]
directory.
To mirror your training images (increases dataset variety):
python mirror.py
This will create mirrored versions of all training images, effectively doubling your dataset size and improving model robustness.
(Skips previously mirror images, preventing duplicates)
Train & Test the gesture recognition model on your collected data:
python train.py
This will:
- Load images from the data/custom directory
- Split data into training (70%), validation (20%), and test (10%) sets
- Train a convolutional neural network with the following architecture:
- 5 convolutional layers with batch normalization
- Dropout regularization
- 3 fully connected layers
- Save the best model as
best_hand_gesture_model.pth
(This file can be terminated after this happens) - Evaluate the model on the test set
- Save the optimized model as
best_hand_gesture_classifier.pt
- Generate a confusion matrix to visualize model performance
The training includes:
- Data augmentation (rotation, translation)
- Early stopping to prevent overfitting
- Learning rate scheduling
- Validation accuracy monitoring
If you terminate training early, you NEED to run the test script to prepare the model for inference:
python test.py
This will:
- Evaluate the model on the test set
- Save the optimized model as
best_hand_gesture_classifier.pt
- Generate a confusion matrix to visualize model performance
(Steps 5-6 of train.py
)
HandGestureRecognition/
├── main.py # Main application
├── train.py # Model training script
├── test.py # Model testing script
├── mirror.py # Data augmentation utility
├── requirements.txt # Required packages
├── best_hand_gesture_classifier.pt # Pre-trained model
├── data/ # Training data directory
│ └── custom/ # Custom gesture data
│ ├── love/ # Images for "love" gesture
│ ├── peace/ # Images for "peace" gesture
│ └── ... # Other gesture folders
├── images/ # README images
│ ├── demo.gif # Demo animation
│ ├── interface_example.jpg # Interface screenshot
│ ├── confusion_matrix.png # Confusion Matrix screenshot
│ └── gestures/ # Example gesture images
│ ├── peace.jpg # Peace sign example
│ ├── love.jpg # Love sign example
│ └── ... # Other gesture examples
├── LICENSE.md # License information
├── CODE_OF_CONDUCT.md # Code of conduct
├── CONTRIBUTING.md # Contribution guidelines
└── README.md # This file
-
Hand Detection:
- MediaPipe Hands detects and tracks 21 hand landmarks
- Landmarks are normalized to hand size and position
-
Region of Interest:
- A square region containing the hand is extracted with padding
- The region is dynamically adjusted to follow hand movements
-
Preprocessing:
- The hand landmarks are drawn as a white wireframe on black background
- This wireframe representation enhances gesture recognition robustness
- The image is resized to 128x128 and normalized
-
Classification:
- The CNN model predicts the gesture class from the preprocessed image
- The model outputs probabilities for each possible gesture
- The highest probability gesture is selected as the prediction
-
Visualization:
- Results are displayed with color-coded confidence scores
- A probability chart shows relative confidence for all classes
The CNN model consists of:
- Input: 1x128x128 grayscale image
- 5 convolutional blocks with batch normalization and max pooling
- 3 fully connected layers with dropout regularization
- Output: N classes (where N is the number of gestures)
Full mathematical explanations and equations can be found here: MATH.md
- Add support for two-handed gestures
- Implement gesture sequence recognition for commands
- Create a GUI for easier model training
- Add more pre-trained gestures
- Support for dynamic gestures (movements)
- Integration with applications via API
Contributions to the Hand Gesture Recognition project are welcome! To contribute, follow these steps:
- Fork the repository on GitHub
- Clone your forked repository to your local machine
- Create a new branch for your changes
- Make your changes and commit them with clear and concise messages
- Push your changes to your forked repository
- Submit a pull request to the original repository
Please adhere to the project's code of conduct and contribution guidelines provided in the CODE_OF_CONDUCT.md and CONTRIBUTING.md files, respectively.
For support, email sir_alexiner@hotmail.com
or open an issue on GitHub.
This project is licensed under:
You are free to download and use this code for educational, personal learning, or non-commercial purposes. While we encourage these uses, please note that using this project for commercial distribution, sales, or any form of monetization is not permitted.