DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach
This is the official pytorch implementation of our CVPR 2024 paper"DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach".
DiffusionRegPose
The codebase has been tested with the following setup:
- Operating System: Ubuntu 20.04
- Python Version: 3.8
- GPU: 1x NVIDIA RTX 3090 with CUDA version 12.0
-
Clone the Repository
git clone https://github.com/cici203/DiffusionRegPose.git cd DiffusionRegPose
-
Install Dependencies
pip install -r requirements.txt
-
Compiling CUDA operators
cd models/diffusionregpose/ops python setup.py build install # unit test (should see all checking is True) python test.py cd ../../..
For CrowdPose data, please download from CrowdPose download, The crowdpose_dir should look like this:
|-- ED-Pose
`-- |-- crowdpose_dir
`-- |-- json
| |-- crowdpose_train.json
| |-- crowdpose_val.json
| |-- crowdpose_trainval.json (generated by util/crowdpose_concat_train_val.py)
| `-- crowdpose_test.json
`-- images
|-- 100000.jpg
|-- 100001.jpg
|-- 100002.jpg
|-- 100003.jpg
|-- 100004.jpg
|-- 100005.jpg
|-- ...
We have put our model checkpoints here.
Download the pretrain models from IDEA-Research/ED-Pose
mkdir pretrain_models
# Put the pretrained models(e.g. edpose_r50_crowdpose.pth) into pretrain_models/
For CrowdPose dataset
Single GPU
#For ResNet-50:
python main.py \
--output_dir "logs/crowdpose_r50" \
-c config/diffusionregpose.cfg.py \
--options batch_size=8 epochs=80 num_body_points=14 backbone="resnet50" \
--dataset_file="crowdpose" \
--pretrain_model_path "pretrain_models/edpose_r50_crowdpose.pth"
#For Swin-L:
python main.py \
--output_dir "logs/crowdpose_swinl" \
-c config/diffusionregpose.cfg.py \
--options batch_size=8 epochs=80 num_body_points=14 backbone="swin_L_384_22k" \
--dataset_file="crowdpose" \
--pretrain_model_path "pretrain_models/edpose_swinl_crowdpose.pth"
Distributed Run
#For ResNet-50:
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/crowdpose_r50" \
-c config/diffusionregpose.cfg.py \
--options batch_size=8 epochs=80 num_body_points=14 backbone="resnet50" \
--dataset_file="crowdpose" \
--pretrain_model_path "pretrain_models/edpose_r50_crowdpose.pth"
#For Swin-L:
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/crowdpose_swinl" \
-c config/diffusionregpose.cfg.py \
--options batch_size=8 epochs=80 num_body_points=14 backbone="swin_L_384_22k" \
--dataset_file="crowdpose" \
--pretrain_model_path "pretrain_models/edpose_swinl_crowdpose.pth"
For CrowdPose dataset
ResNet-50
python main.py \
--output_dir "logs/crowdpose_r50" \
-c config/diffusionregpose.cfg.py \
--options batch_size=1 epochs=80 num_body_points=14 backbone="resnet50" \
--dataset_file="crowdpose" \
--pretrain_model_path "./models/diffusionregpose_r50_crowdpose.pth" \
--eval
Swin-L
export pretrain_model_path=/path/to/your/swin_L_384_22k
python main.py \
--output_dir "logs/crowdpose_swinl" \
-c config/diffusionregpose.cfg.py \
--options batch_size=1 epochs=80 num_body_points=14 backbone="swin_L_384_22k" \
--dataset_file="crowdpose" \
--pretrain_model_path "./models/diffusionregpose_swinl_crowdpose.pth" \
--eval
Our codebase is mainly built upon IDEA-Research/ED-Pose. We thank the authors for their excellent work.
@inproceedings{tan2024diffusionregpose,
title={Diffusionregpose: Enhancing multi-person pose estimation using a diffusion-based end-to-end regression approach},
author={Tan, Dayi and Chen, Hansheng and Tian, Wei and Xiong, Lu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={2230--2239},
year={2024}
}