Source Code for Eye-Tracking using Gaze-Data via WebGazer.JS and Grounded Segment Anything 2.1 - ATTENTION: Only works on LINUX (e.g., via WSL)
If a WSL instance is not yet installed, it can be set up using the following command:
wsl --install -d Ubuntu-22.04
After installing Ubuntu, update the system and install some essential packages:
sudo apt update
sudo apt upgrade -y
sudo apt install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev \
libnss3-dev libssl-dev liblzma-dev libreadline-dev libffi-dev wget \
libsqlite3-dev libbz2-dev
The NVIDIA Toolkit must be installed from the official website. The following steps are copied from NVIDIA's documentation: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#wsl-installation
curl and gnupg are needed for key management:
sudo apt-get install -y ca-certificates curl gnupg
If the old key is still present, it can be removed:
sudo apt-key del 7fa2af80
Now, add the new key and CUDA repository:
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/3bf863cc.pub | sudo gpg --dearmor -o /usr/share/keyrings/cuda-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda-wsl.list
Then, update the system packages list again and install the CUDA Toolkit:
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-1
Add the following lines to .bashrc
:
export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}
export CUDA_HOME=/usr/local/cuda-12.1
Then run:
source ~/.bashrc
Download and install the desired Python version:
wget https://www.python.org/ftp/python/3.10.16/Python-3.10.16.tgz
tar -xvzf Python-3.10.16.tgz
cd Python-3.10.16
Configure and install:
./configure --enable-optimizations
make -j $(nproc)
sudo make altinstall
Verify the installation:
cd ..
python3.10 --version
First, clone the repository:
git clone https://github.com/IDEA-Research/Grounded-SAM-2.git
cd Grounded-SAM-2
Then, create and activate a virtual environment:
python3.10 -m venv GSAM
source GSAM/bin/activate
Install the required packages with the appropriate CUDA versions:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
cd checkpoints
bash download_ckpts.sh
cd ..
cd gdino_checkpoints
bash download_ckpts.sh
cd ..
pip install -e .
pip install --no-build-isolation -e grounding_dino
Install additional dependencies for grounding_dino
:
cd grounding_dino
pip install -r requirements.txt
cd ..
code .
To use the setup via VS Code in Windows, follow these steps:
-
Install the following extensions in Visual Studio Code on Windows:
- WSL Extension
- Python
- Python Environment Manager
-
Close VS Code.
-
In WSL, run the following command in the project folder to open VS Code in Windows:
code .
-
Install extensions for WSL/Ubuntu:
- Install the Python extension
-
Select the GSM Environment in VS Code.
-
You now have a working installation of Grounded SAM! If you have any unresolved questions, contact @ElectricUnit on GitHub
-
Install CUDA 12.1
-
(Windows) Make sure Visual Studio Build Tools (e.g., version 2022) are installed: Link
-
We recommend using Anaconda with Python 3.11.0 or higher
-
install
torch==2.5.1+cu121
-
Clone this repository:
git clone https://github.com/M-Colley/eye-tracking-pipeline.git
-
run
pip install -r requirements.txt
-
Follow the installation guide of Grounded Segment Anything 2 (use SAM 2.1) without Docker (environment variables, etc.)
-
We use
sam2.1_hiera_large.pt
, download weights from here and put them into the root of our directory (functions_grounding_dino.py
looks for it there)
- could be helpful to use the Developer Command Prompt (unclear)
- Personalization: You will have to adapt your custom prompt for better results, depending on your use case
- We also provide necessary functions to use 360-degree videos to work with yaw and pitch (
calculate_view(frame, yaw, pitch)
) - Attention: the coding of the frames is highly important!
- The required quality of the detection can be altered by changing the values
box_threshold
andtext_threshold
. The higher the value, the fewer recognitions (true positives) but also less false positives you will find. - Attention:
get_color_for_class
has to be adapted per use case