PACE: Pose Annotations in Cluttered Environments
(ECCV 2024)

Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W Harley, Leonidas Guibas, Cewu Lu

PACE (Pose Annotations in Cluttered Environments) is a large-scale benchmark designed to advance pose estimation in challenging, cluttered scenarios. PACE provides comprehensive real-world and simulated datasets for both instance-level and category-level tasks, featuring:

55K frames with 258K annotations across 300 videos
238 objects from 43 categories (rigid and articulated)
An innovative annotation system using a calibrated 3-camera setup
PACESim: 100K photo-realistic simulated frames with 2.4M annotations across 931 objects

We evaluate state-of-the-art algorithms on PACE for both pose estimation and object pose tracking, highlighting the benchmark's challenges and research opportunities.

Why a New Dataset?

PACE rigorously tests the generalization of state-of-the-art methods in complex, real-world environments, enabling exploration and quantification of the 'simulation-to-reality' gap for practical applications.

🔥News

Try our latest pose estimator CPPF++ (TPAMI), which achieves state-of-the-art performance on PACE.

Update Log

2024/07/22: PACE v1.1 uploaded to HuggingFace. Benchmark evaluation code released.
2024/03/01: PACE v1.0 released.

Dataset Download

Download the dataset from HuggingFace. Unzip all tar.gz files and place them under dataset/pace for evaluation. Large files are split into chunks; merge them with, e.g., cat test_chunk_* > test.tar.gz.

Dataset Format

PACE follows the BOP format with the following structure (regex syntax):

camera_pbr.json
models(_eval|_nocs)?
├─ models_info.json
├─ (artic_info.json)?
├─ obj_${OBJ_ID}.ply
model_splits
├─ category
|  ├─ ${category}_(train|val|test).txt
|  ├─ (train|val|test).txt
├─ instance
|  ├─ (train|val|test).txt
(train(_pbr_cat|_pbr_inst)|val(_inst|_pbr_cat)|test)
├─ ${SCENE_ID}
│  ├─ scene_camera.json
│  ├─ scene_gt.json
│  ├─ scene_gt_info.json
│  ├─ scene_gt_coco_det_modal(_partcat|_inst)?.json
│  ├─ depth
│  ├─ mask
│  ├─ mask_visib
│  ├─ rgb
|  ├─ (rgb_nocs)?

Key components:

camera_pbr.json: Camera parameters for PBR rendering; real camera parameters are in each scene's scene_camera.json.
models(_eval|_nocs)?: 3D object models. models contains original scanned meshes; models_eval has uniformly sampled point clouds for evaluation (e.g., Chamfer distance); all models (except articulated parts, ID 545–692) are recentered and normalized to a unit bounding box. models_nocs recolors vertices by NOCS coordinates.
- models_info.json: Mesh metadata (diameter, bounds, scales in mm), and mapping from obj_id to object identifier. Articulated objects have multiple parts, each with a unique obj_id; associations are in artic_info.json.
- artic_info.json: Part information for articulated objects, keyed by identifier.
- obj_${OBJ_ID}.ply: Mesh file for object ${OBJ_ID}.
model_splits: Model IDs for train/val/test splits. Instance-level splits share IDs; category-level splits differ per category.
train(_pbr_cat|_pbr_inst)|val(_inst|_pbr_cat)|test: Synthetic and real data for category/instance-level training and validation; real-world test data for both.
- ${SCENE_ID}: Each scene in a separate folder (e.g., 000011).
  - scene_camera.json: Camera parameters.
  - scene_gt.json: Ground-truth annotations (BOP format).
  - scene_gt_info.json: Meta info about ground-truth poses (BOP format).
  - scene_gt_coco_det_modal(_partcat|_inst)?.json: 2D bounding box and instance segmentation in COCO format.
    - scene_gt_coco_det_modal_partcat.json: Treats articulated parts as separate categories (for category-level evaluation).
    - scene_gt_coco_det_modal_inst.json: Treats each object instance as a separate category (for instance-level evaluation). Note: There may be more categories than reported in the paper, as some objects appear only in synthetic data.
  - rgb: Color images.
  - rgb_nocs: Normalized object coordinates as RGB (mapped from [-1, 1] to [0, 1]), normalized w.r.t. object bounding box. Example normalization:
```
mesh = trimesh.load_mesh(ply_fn)
bbox = mesh.bounds
center = (bbox[0] + bbox[1]) / 2
mesh.apply_translation(-center)
extent = bbox[1] - bbox[0]
colors = np.array(mesh.vertices) / extent.max()
colors = np.clip(colors + 0.5, 0, 1.)
```
    See this paper for disambiguation method.
  - depth: 16-bit depth images. Convert to meters by dividing by 10,000 (PBR) or 1,000 (real).
  - mask: Object masks.
  - mask_visib: Visible part masks.

Dataset Visualization

A visualization script is provided to display ground-truth pose annotations and rendered 3D models. Run visualizer.ipynb to generate visualizations like the following:

Benchmark Evaluation

Unzip all tar.gz files from HuggingFace and place them under dataset/pace for evaluation.

Instance-Level Pose Estimation

Ensure the bop_toolkit submodule is cloned: after git clone, run git submodule update --init, or use git clone --recurse-submodules git@github.com:qq456cvb/PACE.git.
Place prediction results at prediction/instance/${METHOD_NAME}_pace-test.csv (baseline results available here).

Run:

cd eval/instance
sh eval.sh ${METHOD_NAME}

Category-Level Pose Estimation

Place prediction results at prediction/category/${METHOD_NAME}_pred.pkl (baseline results available here).
Download ground-truth labels in compatible pkl format from here and place at eval/category/catpose_gts_test.pkl.

Run:

cd eval/category
sh eval.sh ${METHOD_NAME}

Note: There are more categories (55) in category_names.txt than reported in the paper, as some categories lack real-world test images. The actual evaluation categories (47) are in category_names_test.txt (parts are counted separately). Ground-truth class IDs in catpose_gts_test.pkl use indices 1–55, matching category_names.txt.

Annotation Tools

The source code for our annotation tools is organized as follows:

annotation_tool/
├─ inpainting
├─ obj_align
├─ obj_sym
├─ pose_annotate
├─ postprocessing
├─ TFT_vs_Fund
├─ utils

inpainting: Inpaints markers for more realistic images.
obj_align: Aligns objects to a consistent orientation within categories.
obj_sym: Annotates object symmetry information.
pose_annotate: Main pose annotation program.
postprocessing: Post-processing steps (e.g., marker removal, extrinsics refinement/alignment).
TFT_vs_Fund: Refines 3-camera extrinsics.
utils: Miscellaneous helper functions.

Detailed documentation is coming soon. We are working to make the annotation tools as user-friendly as possible for accurate 3D pose annotation.

License

MIT license for all contents except:

Models with IDs 693–1260 are from SketchFab under CC BY. Original posts: https://sketchfab.com/3d-models/${OBJ_IDENTIFIER} (find the identifier in models_info.json).
Models 1165 and 1166 are from GrabCAD (identical geometry, different colors). See GrabCAD license.

Citation

@misc{you2023pace,
    title={PACE: Pose Annotations in Cluttered Environments},
    author={You, Yang and Xiong, Kai and Yang, Zhening and Huang, Zhengxiang and Zhou, Junwei and Shi, Ruoxi and Fang, Zhou and Harley, Adam W. and Guibas, Leonidas and Lu, Cewu},
    booktitle={European Conference on Computer Vision},
    year={2024},
    organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
annotation_tool		annotation_tool
bop_toolkit @ c5c6702		bop_toolkit @ c5c6702
eval		eval
images		images
sample_data		sample_data
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
visualizer.ipynb		visualizer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PACE: Pose Annotations in Cluttered Environments
(ECCV 2024)

Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W Harley, Leonidas Guibas, Cewu Lu

Why a New Dataset?

🔥News

Update Log

Table of Contents

Dataset Download

Dataset Format

Dataset Visualization

Benchmark Evaluation

Instance-Level Pose Estimation

Category-Level Pose Estimation

Annotation Tools

License

Citation

About

Uh oh!

Releases

Packages

Languages

License

qq456cvb/PACE

Folders and files

Latest commit

History

Repository files navigation

PACE: Pose Annotations in Cluttered Environments(ECCV 2024)

Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W Harley, Leonidas Guibas, Cewu Lu

Why a New Dataset?

🔥News

Update Log

Table of Contents

Dataset Download

Dataset Format

Dataset Visualization

Benchmark Evaluation

Instance-Level Pose Estimation

Category-Level Pose Estimation

Annotation Tools

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

PACE: Pose Annotations in Cluttered Environments
(ECCV 2024)

Packages