HACO is a framework for dense hand contact estimation that addresses class and spatial imbalance issues in training on large-scale datasets. Based on 14 datasets that span hand-object, hand-hand, hand-scene, and hand-body interaction, we build a powerful model that learns dense hand contact in diverse scenarios.
- We recommend you to use an Anaconda virtual environment. Install PyTorch >=1.11.0 and Python >= 3.8.0. Our latest HACO model is tested on Python 3.8.20, PyTorch 1.11.0, CUDA 11.3.
- Setup the environment.
# Initialize conda environment
conda create -n haco python=3.8 -y
conda activate haco
# Install PyTorch
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
# Install all remaining packages
pip install -r requirements.txt
You need to follow our directory structure of the data
.
- For quick demo: See
docs/data_demo.md
. - For evaluation: See
docs/data_eval.md
. - For training: See
docs/data_train.md
.
Then, download the official checkpoints and place them in the release_checkpoint
from HuggingFace by running (if not working, try OneDrive):
bash scripts/download_haco_checkpoints.sh
To run HACO on demo images using the WiLoR or Mediapipe hand detector, please run:
python demo.py --backbone {BACKBONE_TYPE} --detector {DETECTOR_TYPE} --checkpoint {CKPT_PATH} --input_path {INPUT_PATH}
For example,
# ViT-H (Default, HaMeR initialized) backbone
python demo.py --backbone hamer --detector wilor --checkpoint release_checkpoint/haco_final_hamer_checkpoint.ckpt --input_path asset/example_images
# ViT-B (ImageNet initialized) backbone
python demo.py --backbone vit-b-16 --detector wilor --checkpoint release_checkpoint/haco_final_vit_b_checkpoint.ckpt --input_path asset/example_images
Note: The demo includes post-processing to reduce noise in small or sparse contact areas.
Before the video demo, please download example videos from HuggingFace and save at asset/example_videos
by running (if not working, try OneDrive):
bash scripts/download_demo_example_videos.sh
To run HACO on demo videos using the WiLoR or Mediapipe hand detector, please run:
python demo_video.py --backbone {BACKBONE_TYPE} --checkpoint {CKPT_PATH} --input_path {INPUT_PATH}
For example,
# ViT-H (Default, HaMeR initialized) backbone
python demo_video.py --backbone hamer --checkpoint release_checkpoint/haco_final_hamer_checkpoint.ckpt --input_path asset/example_videos
# ViT-B (ImageNet initialized) backbone
python demo_video.py --backbone vit-b-16 --checkpoint release_checkpoint/haco_final_vit_b_checkpoint.ckpt --input_path asset/example_videos
Note: The demo includes post-processing for both spatial smoothing of small contact areas and temporal smoothing across frames to ensure stable contact predictions and hand detections.
To train HACO, please run:
python train.py --backbone {BACKBONE_TYPE}
For example,
# ViT-H (Default, HaMeR initialized) backbone
python train.py --backbone hamer
# ViT-B (ImageNet initialized) backbone
python train.py --backbone vit-b-16
To evaluate HACO on MOW dataset, please run:
python test.py --backbone {BACKBONE_TYPE} --checkpoint {CKPT_PATH}
For example,
# ViT-H (Default, HaMeR initialized) backbone
python test.py --backbone hamer --checkpoint release_checkpoint/haco_final_hamer_checkpoint.ckpt
# ViT-L (ImageNet initialized) backbone
python test.py --backbone vit-l-16 --checkpoint release_checkpoint/haco_final_vit_l_checkpoint.ckpt
# ViT-B (ImageNet initialized) backbone
python test.py --backbone vit-b-16 --checkpoint release_checkpoint/haco_final_vit_b_checkpoint.ckpt
# ViT-S (ImageNet initialized) backbone
python test.py --backbone vit-s-16 --checkpoint release_checkpoint/haco_final_vit_s_checkpoint.ckpt
# FPN (HandOccNet initialized) backbone
python test.py --backbone handoccnet --checkpoint release_checkpoint/haco_final_handoccnet_checkpoint.ckpt
# HRNet-W48 (ImageNet initialized) backbone
python test.py --backbone hrnet-w48 --checkpoint release_checkpoint/haco_final_hrnet_w48_checkpoint.ckpt
# HRNet-W32 (ImageNet initialized) backbone
python test.py --backbone hrnet-w32 --checkpoint release_checkpoint/haco_final_hrnet_w32_checkpoint.ckpt
# ResNet-152 (ImageNet initialized) backbone
python test.py --backbone resnet-152 --checkpoint release_checkpoint/haco_final_resnet_152_checkpoint.ckpt
# ResNet-101 (ImageNet initialized) backbone
python test.py --backbone resnet-101 --checkpoint release_checkpoint/haco_final_resnet_101_checkpoint.ckpt
# ResNet-50 (ImageNet initialized) backbone
python test.py --backbone resnet-50 --checkpoint release_checkpoint/haco_final_resnet_50_checkpoint.ckpt
# ResNet-34 (ImageNet initialized) backbone
python test.py --backbone resnet-34 --checkpoint release_checkpoint/haco_final_resnet_34_checkpoint.ckpt
# ResNet-18 (ImageNet initialized) backbone
python test.py --backbone resnet-18 --checkpoint release_checkpoint/haco_final_resnet_18_checkpoint.ckpt
ImportError: cannot import name 'bool' from 'numpy'
: Please just comment out the linefrom numpy import bool, int, float, complex, object, unicode, str, nan, inf
.np.int was a deprecated alias for the builtin int. To avoid this error in existing code, use int by itself. Doing this will not modify any behavior and is safe. When replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information
: Please refer to here.
We thank:
- DECO for human-scene contact estimation.
- CB Loss for inspiration on VCB Loss.
- HaMeR for Transformer-based regression architecture.
@article{jung2025haco,
title = {Learning Dense Hand Contact Estimation from Imbalanced Data},
author = {Jung, Daniel Sungho and Lee, Kyoung Mu},
journal = {arXiv preprint arXiv:2505.11152},
year = {2025}
}