Hand tracking and gesture recognition using two trained AlexNet models.
- HandTracker and GestureNet trained on the HaGRID Dataset:
- AlexNet from ImageNet Classification with Deep Convolutional Neural Networks paper.
- Install required Python packages:
pip install -r requirements.txt
- Run main.py to test the models using your webcam!
python main.py
HandTracker crops an image to just one hand. For each hand gesture, the HaGRID dataset contains 4 x, y pixel coordinates that represent a crop-box around a sample’s hand. We can use this to train a model that takes an image as an input and outputs 4 float values representing pixel coordinates. We can use the AlexNet architecture to take in an input image of size 227x277x1 and train a model to predict 4 float values. See the diagram below for the HandTracker architecture.
GestureNet takes the output from HandTracker and uses cropped images of the isolated hand gestures as training data. We train this model on classification of 4 class labels: ‘like’, ‘dislike’, ‘palm’ and 'no_gesture'. See the diagram below for the GestureNet architecture.