Skip to content

Commit 065c487

Browse files
Rakesh Ravi Shankarmsintov
authored andcommitted
VIC-11755: Adding the Sign Language Recognizer Example (#173)
1 parent db9712f commit 065c487

File tree

8 files changed

+626
-2
lines changed

8 files changed

+626
-2
lines changed

anki_vector/util.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1046,8 +1046,8 @@ def apply_overlay(self, image: Image.Image) -> None:
10461046
image_width, image_height = image.size
10471047
remaining_width = image_width - self.width
10481048
remaining_height = image_height - self.height
1049-
x1, y1 = remaining_height // 2, remaining_width // 2
1050-
x2, y2 = (image_height - (remaining_height // 2)), (image_width - (remaining_width // 2))
1049+
x1, y1 = remaining_width // 2, remaining_height // 2
1050+
x2, y2 = (image_width - (remaining_width // 2)), (image_height - (remaining_height // 2))
10511051

10521052
for i in range(0, self.line_thickness):
10531053
d.rectangle([x1 + i, y1 + i, x2 - i, y2 - i], outline=self.line_color)
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Overview:
2+
3+
This project passes the robot's camera feed through a [Convolutional Neural Network](https://en.wikipedia.org/wiki/Convolutional_neural_network) (CNN) built to recognize American Sign Language hand signs. The network is built using [Keras](https://keras.io/) with a [TensorFlow](https://www.tensorflow.org/guide/keras) Backend. It predicts the sign visible (if any) per frame and therefore cannot predict more complex sign language gestures.
4+
5+
>Network Architecture:
6+
>ConvLayer -> MaxPoolLayer -> ConvLayer -> MaxPoolLayer -> ConvLayer -> Dropout -> Flatten -> Dense -> Dropout -> Dense
7+
8+
The network is built as a Keras [Sequential](https://keras.io/getting-started/sequential-model-guide/) model which consists of a linear stack of layers of the following types:
9+
10+
ConvLayer (Convolutional layer): https://keras.io/layers/convolutional/
11+
12+
MaxPoolLayer (Max Pooling layer): https://keras.io/layers/pooling/
13+
14+
Flatten, Dense, Dropout layers: https://keras.io/layers/core/
15+
16+
The `data_gen.py` script can be used to build/expand the dataset to train and test the model. Each image captured is used to generate a multiplier number of other images to expand the dataset. The images are translated to 200x200 black and white images to reduce the complexity of the network. While capturing images from the feed ensure that your hand is positioned within the red square which represents the cropped image dimensions.
17+
18+
>Note: This project has not been tested on Windows and Linux environments. Currently the `data_gen.py` script will not run on Windows due to its dependency on the `curses` library.
19+
20+
21+
# Additional Dependencies:
22+
23+
Install the additional dependencies required for the project:
24+
```
25+
pip3 install keras
26+
pip3 install numpy
27+
pip3 install scipy
28+
pip3 install scikit-learn
29+
pip3 install tensorflow
30+
```
31+
32+
# Dataset:
33+
34+
### Import Dataset:
35+
36+
This project includes a sample dataset that can be used as a starting point to train and test the neural network.
37+
38+
Unpack the dataset:
39+
```
40+
unzip dataset.zip
41+
```
42+
43+
>Note: This dataset contains 200x200 black-and-white images of two hand signs ("a", "b"). Additionally, the `stats.json` file provides the number of images of each alphabet type.
44+
45+
### Generate Dataset:
46+
47+
![hand-sign](./example.png)
48+
49+
Use the following command to capture more images and expand the dataset. Expanding the dataset to include more images can help improve the network's performance since:
50+
51+
1. The network may not have seen any reference images to help distinguish your background. Especially if you are testing against a noisy background.
52+
53+
2. The dataset only contains images of the same hand and might not identify others that look significantly different.
54+
55+
56+
```python3 data_gen.py --dataset_root_folder <path_to_folder>```
57+
58+
>Note: In order to capture an image, display the hand sign within the red frame on the camera feed displayed and press the key corresponding to the label representing the hand sign. Dimensions of images in the dataset are 200x200.
59+
60+
>Note: Along with capturing images of hand signs it is also important to capture images of the background without any hand sign in the frame. This ensures that while predicting a frame without any hand signs, it is correctly classified. Use the space bar on your keyboard to record a background image.
61+
62+
63+
# Run Project:
64+
65+
### Project Phases:
66+
67+
There are two main phases: `train` and `predict`. In the training phase, the neural network is given a set of labeled hand sign images. Once the network has been trained to classify the images we move to the prediction phase. The `predict` option launches the robot's camera feed and looks to classify any visible hand signs. If a frame is classified the recognized letter is spoken out loud.
68+
69+
### Training:
70+
71+
Train the neural network using an image dataset to classify different hand signs.
72+
73+
```
74+
python3 recognizer.py --train --dataset_root_folder <path_to_folder> [--model_config <path_to_config_file>] [--model_weights <path_to_weights_file>]
75+
```
76+
77+
>Note: Use the `model_config` and `model_weights` flags to save the model's configurations after it has been trained. This way the model does not need to be re-trained before predicting.
78+
79+
### Prediction:
80+
81+
Use a trained neural network to predict any visible hand signs in the robot's field of view (within the region of interest marked with a red square).
82+
83+
```
84+
python3 recognizer.py --predict --model_config <path_to_config_file> --model_weights <path_to_weights_file>
85+
```
86+
87+
>Note: Use the `model_config` and `model_weights` flags to load an existing model's configuration. If not using an existing model's configuration, train the model first.
88+
89+
### Train and Predict:
90+
91+
```
92+
python3 recognizer.py --train --predict --dataset_root_folder <path_to_folder>
93+
```
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# Copyright (c) 2019 Anki, Inc.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License in the file LICENSE.txt or at
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
"""Data generation script to build a training and test dataset.
16+
17+
A sample dataset is included in the project ("dataset.zip"). Unzip the folder and use the
18+
--dataset_root_folder option to specify the file path and expand this dataset.
19+
20+
Use this script to build/expand the data needed to train the sign language recognition system.
21+
"""
22+
23+
from concurrent.futures import CancelledError
24+
import curses
25+
import json
26+
import os
27+
import platform
28+
from pathlib import Path
29+
import random
30+
import sys
31+
import tempfile
32+
import time
33+
34+
try:
35+
import numpy as np
36+
except ImportError as exc:
37+
sys.exit("Cannot import numpy: Do `pip3 install numpy` to install")
38+
39+
try:
40+
from PIL import Image
41+
except ImportError:
42+
sys.exit("Cannot import from PIL: Do `pip3 install --user Pillow` to install")
43+
44+
try:
45+
from scipy import ndimage
46+
except ImportError as exc:
47+
sys.exit("Cannot import scipy: Do `pip3 install scipy` to install")
48+
49+
import anki_vector
50+
import util
51+
52+
53+
def data_capture(camera: anki_vector.camera.CameraComponent, stats: dict, root_folder: str) -> None:
54+
"""Build an image dataset using the camera feed from Vector.
55+
56+
This method uses an image from the camera and generates a multiplier number of images by
57+
rotating the original image. The keystroke used to initiate the image capture and processing
58+
is used to label the image.
59+
"""
60+
61+
try:
62+
# TODO: curses works well with Mac OS and Linux, explore msvcrt for Windows
63+
terminal = curses.initscr()
64+
curses.cbreak()
65+
curses.noecho()
66+
terminal.nodelay(True)
67+
68+
# The number of images to generate using the image captured as a seed
69+
image_multiplier = 10
70+
# The maximum amount of rotation by which to rotate the original image to generate more images
71+
min_rotation = -10
72+
max_rotation = 10
73+
74+
print("------ capturing hand signs dataset, press ctrl+c to exit ------")
75+
while True:
76+
key = terminal.getch()
77+
if (ord("a") <= key <= ord("z")) or (key == ord(" ")):
78+
79+
# Represents background images, filenames are switched to be prefixed with "background" instead of " "
80+
if key == ord(" "):
81+
key = "background"
82+
else:
83+
key = chr(key)
84+
85+
# Pull image from camera
86+
original_image = camera.latest_image.raw_image
87+
if original_image:
88+
# Convert image to black and white
89+
black_white_image = original_image.convert("L")
90+
rotation_axes = [1, 1, 0]
91+
92+
# Generate more images with random rotation
93+
for rotation in random.sample(range(min_rotation, max_rotation), image_multiplier):
94+
# Randomly define which axis to rotate the image by
95+
random.shuffle(rotation_axes)
96+
x_axis_rotation_enabled, y_axis_rotation_enabled = rotation_axes[:2]
97+
rotated_image_array = ndimage.rotate(black_white_image,
98+
rotation,
99+
axes=(x_axis_rotation_enabled, y_axis_rotation_enabled),
100+
reshape=False)
101+
102+
# Convert to a 200*200 image
103+
rotated_image = Image.fromarray(rotated_image_array)
104+
cropped_image = util.crop_image(rotated_image, util.NetworkConstants.IMAGE_WIDTH, util.NetworkConstants.IMAGE_HEIGHT)
105+
106+
# Save the image
107+
image_filename = key + "_" + str(stats.get(key, 0)) + ".png"
108+
stats[key] = stats.get(key, 0) + 1
109+
cropped_image.save(os.path.join(root_folder, image_filename))
110+
111+
# Character
112+
print(f"Recorded images for {key}\n\r")
113+
except (CancelledError, KeyboardInterrupt):
114+
pass
115+
finally:
116+
curses.nocbreak()
117+
curses.echo()
118+
curses.endwin()
119+
120+
121+
def main():
122+
stats = {}
123+
124+
args = util.parse_command_args()
125+
if not args.dataset_root_folder:
126+
args.dataset_root_folder = str(Path(tempfile.gettempdir(), "dataset"))
127+
print(f"No data folder defined, saving to {args.dataset_root_folder}")
128+
os.makedirs(args.dataset_root_folder, exist_ok=True)
129+
time.sleep(2)
130+
131+
# Read existing stats or set new stats up
132+
if os.path.isfile(os.path.join(args.dataset_root_folder, "stats.json")):
133+
with open(os.path.join(args.dataset_root_folder, "stats.json"), "r") as stats_file:
134+
stats = json.load(stats_file)
135+
else:
136+
stats = {}
137+
138+
with anki_vector.Robot(args.serial) as robot:
139+
try:
140+
# Add a rectangular overlay describing the portion of image that is used after cropping.
141+
# TODO: The rectangle overlay should feed in a full rect, not just a size
142+
frame_of_interest = anki_vector.util.RectangleOverlay(util.NetworkConstants.IMAGE_WIDTH, util.NetworkConstants.IMAGE_HEIGHT)
143+
robot.viewer.overlays.append(frame_of_interest)
144+
robot.camera.init_camera_feed()
145+
robot.viewer.show()
146+
data_capture(robot.camera, stats, args.dataset_root_folder)
147+
finally:
148+
with open(os.path.join(args.dataset_root_folder, "stats.json"), "w") as stats_file:
149+
# Save the stats of expanded dataset
150+
json.dump(stats, stats_file)
151+
152+
# Reset the terminal
153+
print(f"Data collection done!\nData stored in {args.dataset_root_folder}")
154+
155+
156+
if __name__ == '__main__':
157+
main()
Binary file not shown.
Loading

0 commit comments

Comments
 (0)