The PSL Dictionary dataset is a foundational resource for PSL learners, interpreters, and researchers. Developed to bridge communication gaps within the Deaf community, this dataset offers an extensive collection of video-recorded gestures, allowing individuals to effectively learn and communicate using PSL. The dictionary contains a rich collection of PSL signs corresponding to a wide range of words used in everyday language.
The PSL dataset is available as video-recorded gestures on the PSL Dictionary webpage, with each video offering a download feature. However, manually downloading the videos is a time-consuming and cumbersome task, especially for research purposes where a large dataset is needed to build a high-performing, robust system. Furthermore, researchers often need to modify or expand the dataset based on varying project requirements. These challenges motivated us to develop an automated system for efficiently downloading the necessary videos from the PSL Dictionary dataset.
This toolkit has been employed in research studies published in the following research papers:
- Hamza, H.M., Wali, A. Pakistan sign language recognition: leveraging deep learning models with limited dataset. Machine Vision and Applications 34, 71 (2023). https://doi.org/10.1007/s00138-023-01429-8
- Hamza, H.M., Wali, A. Pakistan Sign Language Recognition: From Videos to Images. SIViP 19, 682 (2025). https://doi.org/10.1007/s11760-025-04230-4
We used Selenium to scrape the PSL dictionary dataset. Since Selenium requires a WebDriver to execute automation tasks, setting up the WebDriver is a necessary step before using this tool.
Setting up a virtual environment is recommended so that all packages are installed in it and are isolated from the OS.
Create a virtual environment using the following command:
python -m venv venv
After creating the virtual environment, activate it using the following command:
source venv/bin/activate
Install the required packages using the following command:
pip install -r requirements.txt
The workflow of this automated toolkit is briefly described below and illustrated in the accompanying figure:
- Scrape the PSL Dictionary dataset: Use the vocabulary defined in
vocabs.json
to gather the URLs of video gestures and save these URLs inurls.json
. - Download the videos: Retrieve the video files using the URLs listed in
urls.json
. - Crop the videos: Process the videos to remove all foreground objects except the signer.
- Split the videos: Divide each video into two segments, each representing the same gesture. One segment becomes part of the training data, while the other is reserved for testing.
To download the videos, we first need their URLs. Therefore, we created a configuration file, configs/vocab.json
, where we listed all the words we planned to download along with their categories in JSON format
The first step is to define the words that are to be fetched from the PSL dictionary. These are defined in the configuration file configs/vocab.json
.
Note: The word should be written in the same format as found in the HTML. For example, if a word is capitalized in the HTML, it should be capitalized in the
configs/vocab.json
too. You can inspect element to check the format of a particular word.
To get the information about the defined vocabulary, run the scripts/vocab_info.py
script as follows:
python scripts/vocab_info.py
The scraper navigates through the PSL website searching for the words defined in configs/vocab.json
. The scraping script is executed as follows:
python scripts/scrape.py
The complete procedure for scraping and collecting the video URLs via Selenium automation is detailed below:
- Instantiate the Chrome WebDriver.
- Load the vocabulary configuration consisting of list of categories and words from
the
configs/vocab.json
file. - Initialize an empty dictionary,
results
, to store the video URLs. - Iterate through each category
C
within the loaded vocabulary configuration.- Navigate to the category’s webpage by forming the corresponding XPath.
- Iterate through the list of words in
C
.- Navigate to the word’s webpage by forming the corresponding XPath.
- Retrieve the HTML source of the webpage.
- Parse the HTML source with BeautifulSoup, extract the link to the video, and append it to
results
.
- Once all words are processed, close and quit the WebDriver.
- Save
results
, containing all links, in theconfigs/urls.json
file.
For each URL in configs/urls.json
, we download the corresponding video file using the requests
library, streaming the content to avoid loading it all into memory at once. The downloading script is executed as follows:
python scripts/download.py
The downloaded video is saved as an MP4 file in the dataset/original
folder.
The cropping script removes all foreground objects (e.g., PSL logo, signage, and text in both English and Urdu), retaining only the signer, as illustrated below:
To execute the cropping script, use the following command. It processes all videos in the specified directory, and the cropped videos will be saved in the dataset/cropped
folder.
python scripts/crop.py
Since each video contains a signer performing the same sign twice, we split it into two segments, each representing one instance of the gesture. The splitting script is executed as follows, processing all videos in the specified directory and saving the two split videos in the dataset/train
and dataset/test
folders, respectively:
python scripts/split.py
We applied a simple yet effective algorithm to achieve this. By calculating the midpoint of the video’s duration, we split it into two parts: the first segment covers the video from the start to the midpoint, and the second from the midpoint to the end. As each video begins and ends with a black screen, we trimmed these sections by skipping
Algorithm: Splitting a video
1. Function SPLIT(video)
2. Set start = 0.7
3. Set end = video.duration − 0.7
4. Set mid = (start + end) / 2
5. Set v1 = video.subclip(start, mid)
6. Set v2 = video.subclip(mid, end)
7. v1.write_videofile(filename1)
8. v2.write_videofile(filename2)
9. End Function
Five geometric and color-manipulation augmentation techniques were employed to increase the training data. These were brightness, noise, scaling, translation, and rotation. These techniques were implemented using the OpenCV-Python library with the following parameters: rotation_angle = 10
, scale_percent = 75
, and translation_factor = 10
. The brightness was controlled by control variables alpha
and beta
with values of 1.5
and 5
, respectively. The salt & pepper noise was added with a proportion of 0.05
. The figure below illustrates a frame from each type of augmented video.
The augmentation script is executed as follows, which applies the specified augmentation technique to the videos in the specified directory:
python scripts/augment.py
If you use this toolkit in your research, please consider citing our work:
@misc{psldataset2024,
author = {Hafiz Muhammad Hamza},
title = {Pakistan Sign Language (PSL) Dataset Toolkit},
howpublished = {\url{https://github.com/hmhamza/psl-dataset}},
year = {2024}
}