Within this repo, the code required to perform 3D gaze, including depth estimation, gaze estimation, 3d reconstruction, and gaze ray tracking can be found. The 2D gaze tracking method can be found here.
Project data stored in Vanderbilt Box
This repo includes the source code and the data for the AIED2024 submission. Store the data located in the Vanderbilt Box shared folder within the empty data
directory at the root of the Github repo. This data
folder in the Vanderbilt box includes generated artefacts from the scripts along with human annotations/corrections that are necessary for runnign the scripts.
Once finished downloading the data located in the Vanderbilt box, please unzip the following files:
reid/cropped_faces/d1g1.zip -> reid/cropped_faces/d1g1/
reid/cropped_faces/d1g2.zip -> reid/cropped_faces/d1g2/
reid/cropped_faces/d2g1.zip -> reid/cropped_faces/d2g1/
reid/cropped_faces/d2g2.zip -> reid/cropped_faces/d2g2/
This was required for uploading the raw directories resulted in failed uploads.
The gaze pipeline within the scripts/gaze
directory is composed of the following linear sequence:
reid.ipynb
: We started with a semi-automated ReID (a precursor of the ReID project in the OELE lab) that uses facial recognition, from thedeepface
PyPI package, that matches the face with the larger body bounding boxes. Manual corrections were needed, since facial recogntion was error prone with children participants.depth_estimation.ipynb
: Used ZoeDepth to perform metric depth estimation on the entire RGB videos and generated depth videos.gaze_estimation.ipynb
: Used L2-CSNet to perform gaze estimation on the videos with face crops.reconstruction3d.ipynb
: This is the bulk of the work, including the 3D gaze and Object-Of-Interest encoding. This file performs aprocess
that takes the tracking, video, depth, and gaze and reconstructs the entire 3D scene as a point cloud, 3D bounding boxes, and 3D gaze rays. Using a custom 3D video plotter named Plot3D, which could be replaced with the more reliable Open3D Visualizer, we could visualize the reconstructed 3D scene and its progression throughout the session. Within this code, we perform the following:- Use human-annotated positions of the floor and projector, using the Vision6D tool to manually place these objects in the 3D scene.
- Using the depth video, place the 3D gaze vector and bounding box matching each participant.
- Perform gaze ray tracking using Trimesh to identify what person or object a participant is looking at.
- Display the 3D scene with Plot3D and tag each video frame with PersonA->Object/PersonB information.
final_merge.ipynb
: Lastly, this script meets the requirement of the output format -- time-window, pooled at N seconds, and the gaze target IDs match the human-readable ID (e.g., Taylor Swift instead of Student1).
For future data analysis, I recommend the following suggestions:
- ReID. The ReID pipeline needs to be updated to take advantage of Ashwin et al.'s newer and more powerful ReID pipeline.
- Depth. Use the depth stereo cameras from Luxonis such as the Oak-D Pro W that were recently purchased by the OELE lab.
- 3D Video Plotting. Instead of using the poorly documented and custom-made Plot3D, I recommend using the better established Open3D Visualizer, with non-blocking visualization to support the real-time updates from video -- docs found here.
- Gaze Estimation. L2-CSNet is convenient and easy to use -- however; it's performance is lacking and there is likely to be a much better performing gaze estimation method available now.