A computer vision project for multi-person 3D pose estimation using the MoveNet model by Google TensorFlow for pose estimation and MiDaS v2.1 Small model by Intel ISL for depth estimation.
Create and activate a new Anaconda environment called py311
with Python
3.11:
conda create -n py311 python=3.11
conda activate py311
Within this project's directory:
pip install -r requirements.txt
Create and activate a Poetry environment with Python 3.11.x. Within this project's directory, run:
poetry install
poetry shell
- Detect 2D keypoints for multiple individuals using the MoveNet multi-pose model.
- Estimate depth (
Z
) values for each detected 2D keypoint using the MiDaS depth estimation model. - Apply smoothing to depth values to reduce jitter.
- Convert 2D keypoints into 3D coordinates using the pinhole camera model (leveraging camera intrinsics).
- Compute each person’s weighted centroid based on the X-, Y-, and Z-coordinates, weighted by confidence scores.
- At each time point, calculate dyadic proximity as the Euclidean distance between individuals' centroids.