Skip to content

qq456cvb/PACE

Repository files navigation

PACE: Pose Annotations in Cluttered Environments
(ECCV 2024)

PACE Teaser


PACE (Pose Annotations in Cluttered Environments) is a large-scale benchmark designed to advance pose estimation in challenging, cluttered scenarios. PACE provides comprehensive real-world and simulated datasets for both instance-level and category-level tasks, featuring:

  • 55K frames with 258K annotations across 300 videos
  • 238 objects from 43 categories (rigid and articulated)
  • An innovative annotation system using a calibrated 3-camera setup
  • PACESim: 100K photo-realistic simulated frames with 2.4M annotations across 931 objects

We evaluate state-of-the-art algorithms on PACE for both pose estimation and object pose tracking, highlighting the benchmark's challenges and research opportunities.


Why a New Dataset?

  • PACE rigorously tests the generalization of state-of-the-art methods in complex, real-world environments, enabling exploration and quantification of the 'simulation-to-reality' gap for practical applications.

🔥News

  • Try our latest pose estimator CPPF++ (TPAMI), which achieves state-of-the-art performance on PACE.

Update Log

  • 2024/07/22: PACE v1.1 uploaded to HuggingFace. Benchmark evaluation code released.
  • 2024/03/01: PACE v1.0 released.

Table of Contents


Dataset Download

Download the dataset from HuggingFace. Unzip all tar.gz files and place them under dataset/pace for evaluation. Large files are split into chunks; merge them with, e.g., cat test_chunk_* > test.tar.gz.


Dataset Format

PACE follows the BOP format with the following structure (regex syntax):

camera_pbr.json
models(_eval|_nocs)?
├─ models_info.json
├─ (artic_info.json)?
├─ obj_${OBJ_ID}.ply
model_splits
├─ category
|  ├─ ${category}_(train|val|test).txt
|  ├─ (train|val|test).txt
├─ instance
|  ├─ (train|val|test).txt
(train(_pbr_cat|_pbr_inst)|val(_inst|_pbr_cat)|test)
├─ ${SCENE_ID}
│  ├─ scene_camera.json
│  ├─ scene_gt.json
│  ├─ scene_gt_info.json
│  ├─ scene_gt_coco_det_modal(_partcat|_inst)?.json
│  ├─ depth
│  ├─ mask
│  ├─ mask_visib
│  ├─ rgb
|  ├─ (rgb_nocs)?

Key components:

  • camera_pbr.json: Camera parameters for PBR rendering; real camera parameters are in each scene's scene_camera.json.
  • models(_eval|_nocs)?: 3D object models. models contains original scanned meshes; models_eval has uniformly sampled point clouds for evaluation (e.g., Chamfer distance); all models (except articulated parts, ID 545–692) are recentered and normalized to a unit bounding box. models_nocs recolors vertices by NOCS coordinates.
    • models_info.json: Mesh metadata (diameter, bounds, scales in mm), and mapping from obj_id to object identifier. Articulated objects have multiple parts, each with a unique obj_id; associations are in artic_info.json.
    • artic_info.json: Part information for articulated objects, keyed by identifier.
    • obj_${OBJ_ID}.ply: Mesh file for object ${OBJ_ID}.
  • model_splits: Model IDs for train/val/test splits. Instance-level splits share IDs; category-level splits differ per category.
  • train(_pbr_cat|_pbr_inst)|val(_inst|_pbr_cat)|test: Synthetic and real data for category/instance-level training and validation; real-world test data for both.
    • ${SCENE_ID}: Each scene in a separate folder (e.g., 000011).
      • scene_camera.json: Camera parameters.
      • scene_gt.json: Ground-truth annotations (BOP format).
      • scene_gt_info.json: Meta info about ground-truth poses (BOP format).
      • scene_gt_coco_det_modal(_partcat|_inst)?.json: 2D bounding box and instance segmentation in COCO format.
        • scene_gt_coco_det_modal_partcat.json: Treats articulated parts as separate categories (for category-level evaluation).
        • scene_gt_coco_det_modal_inst.json: Treats each object instance as a separate category (for instance-level evaluation). Note: There may be more categories than reported in the paper, as some objects appear only in synthetic data.
      • rgb: Color images.
      • rgb_nocs: Normalized object coordinates as RGB (mapped from [-1, 1] to [0, 1]), normalized w.r.t. object bounding box. Example normalization:
        mesh = trimesh.load_mesh(ply_fn)
        bbox = mesh.bounds
        center = (bbox[0] + bbox[1]) / 2
        mesh.apply_translation(-center)
        extent = bbox[1] - bbox[0]
        colors = np.array(mesh.vertices) / extent.max()
        colors = np.clip(colors + 0.5, 0, 1.)
        See this paper for disambiguation method.
      • depth: 16-bit depth images. Convert to meters by dividing by 10,000 (PBR) or 1,000 (real).
      • mask: Object masks.
      • mask_visib: Visible part masks.

Dataset Visualization

A visualization script is provided to display ground-truth pose annotations and rendered 3D models. Run visualizer.ipynb to generate visualizations like the following:


Benchmark Evaluation

Unzip all tar.gz files from HuggingFace and place them under dataset/pace for evaluation.

Instance-Level Pose Estimation

  • Ensure the bop_toolkit submodule is cloned: after git clone, run git submodule update --init, or use git clone --recurse-submodules git@github.com:qq456cvb/PACE.git.
  • Place prediction results at prediction/instance/${METHOD_NAME}_pace-test.csv (baseline results available here).
  • Run:
    cd eval/instance
    sh eval.sh ${METHOD_NAME}

Category-Level Pose Estimation

  • Place prediction results at prediction/category/${METHOD_NAME}_pred.pkl (baseline results available here).
  • Download ground-truth labels in compatible pkl format from here and place at eval/category/catpose_gts_test.pkl.
  • Run:
    cd eval/category
    sh eval.sh ${METHOD_NAME}

Note: There are more categories (55) in category_names.txt than reported in the paper, as some categories lack real-world test images. The actual evaluation categories (47) are in category_names_test.txt (parts are counted separately). Ground-truth class IDs in catpose_gts_test.pkl use indices 1–55, matching category_names.txt.


Annotation Tools

The source code for our annotation tools is organized as follows:

annotation_tool/
├─ inpainting
├─ obj_align
├─ obj_sym
├─ pose_annotate
├─ postprocessing
├─ TFT_vs_Fund
├─ utils
  • inpainting: Inpaints markers for more realistic images.
  • obj_align: Aligns objects to a consistent orientation within categories.
  • obj_sym: Annotates object symmetry information.
  • pose_annotate: Main pose annotation program.
  • postprocessing: Post-processing steps (e.g., marker removal, extrinsics refinement/alignment).
  • TFT_vs_Fund: Refines 3-camera extrinsics.
  • utils: Miscellaneous helper functions.

Detailed documentation is coming soon. We are working to make the annotation tools as user-friendly as possible for accurate 3D pose annotation.


License

MIT license for all contents except:

  • Models with IDs 693–1260 are from SketchFab under CC BY. Original posts: https://sketchfab.com/3d-models/${OBJ_IDENTIFIER} (find the identifier in models_info.json).
  • Models 1165 and 1166 are from GrabCAD (identical geometry, different colors). See GrabCAD license.

Citation

@misc{you2023pace,
    title={PACE: Pose Annotations in Cluttered Environments},
    author={You, Yang and Xiong, Kai and Yang, Zhening and Huang, Zhengxiang and Zhou, Junwei and Shi, Ruoxi and Fang, Zhou and Harley, Adam W. and Guibas, Leonidas and Lu, Cewu},
    booktitle={European Conference on Computer Vision},
    year={2024},
    organization={Springer}
}

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published