Skip to content
/ DTTD2 Public

[CVPRW 2025] Official repository of DTTDNet: Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset

License

Notifications You must be signed in to change notification settings

augcog/DTTD2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DTTDNet: Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset

arXiv Paper with Code DTTD-Mobile Dataset πŸ‘€ CVPRW 2025 MAI 🧠 ICMLW 2024 DMLR

This repository is the implementation code of the 2025 CVPR workshop (Mobile AI) paper "Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset" Group 162

Are current 3D object tracking methods truely robust enough for low-fidelity depth sensors like the iPhone LiDAR?

We introduce DTTD-Mobile, a new benchmark built on real-world data captured from mobile devices. We evaluate several popular methodsβ€”including BundleSDF, ES6D, MegaPose, and DenseFusionβ€”and highlight their limitations in this challenging setting. To go a step further, we propose DTTD-Net with a Fourier-enhanced MLP and a two-stage attention-based fusion across RGB and depth, making 6DoF pose estimation more robustβ€”even when the input is noisy, blurry, or partially occluded.

Dataset: Checkout DTTD-Mobile and the Robotics Dataset Extension.

Updates:

  • [05/01/25] The archival version of this work will be presented at 2025 CVPRW: Mobile AI.
  • [11/05/24] We extended DTTD for specific grasping and insertion tasks using FANUC robotic arm, released here. Feel free to contact zixun@berkeley.edu and xiang_zhang_98@berkeley.edu for details on this dataset extension.
  • [11/05/24] The DTTD-Mobile dataset has been migrated to huggingface due to our Google Drive storage issues, check here.
  • [09/10/24] Our MoCap data pipeline has been released, check here (iPhone-ARKit-based version) for your customized data collection and annotation. For the release of our data capture app for iPhone, check here. For our previous released Azure-based version, check here.
  • [06/17/24] Our work has been accepted at 2024 ICML workshop: Data-centric Machine Learning Research. demo video, openreview
  • [09/28/23] Our trained checkpoints for pose estimator are released here.
  • [09/27/23] Our dataset: DTTD-Mobile is released.

Citation

If our work is useful or relevant to your research, please kindly recognize our contributions by citing our papers:

@InProceedings{dttd2,
    author    = {Huang, Zixun and Yao, Keling and Zhao, Zhihao and Pan, Chuanyu and Yang, Allen},
    title     = {Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {1848-1857}
}
@InProceedings{dttd1,
    author    = {Feng, Weiyu and Zhao, Seth Z. and Pan, Chuanyu and Chang, Adam and Chen, Yichen and Wang, Zekun and Yang, Allen Y.},
    title     = {Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2023},
    pages     = {3288-3297}
}

Dependencies:

Before running our pose estimation pipeline, you can activate a conda environment where Python Version >= 3.8:

conda create --name [YOUR ENVIR NAME] python = [PYTHON VERSION]
conda activate [YOUR ENVIR NAME]

then install all necessary packages:

torch
torchvision
torchaudio
numpy
einops
pillow
scipy
opencv_python
tensorboard
tqdm

For knn module used in ADD-S loss, install KNN_CUDA: https://github.com/pptrick/KNN_CUDA. (Install KNN_CUDA requires CUDA environment, ensure that your CUDA version >= 10.2. Also, It only supports torch v1.0+.)

Load Dataset and Checkpoints:

Download our checkpoints and datasets, then organize the file structure:

Robust-Digital-Twin-Tracking
β”œβ”€β”€ checkpoints
β”‚   β”œβ”€β”€ m8p4.pth
β”‚   β”œβ”€β”€ m8p4_filter.pth
β”‚   └── ...
|   ...
└── dataset
    └── dttd_iphone
        β”œβ”€β”€ dataset_config
        β”œβ”€β”€ dataset.py
        └── DTTD_IPhone_Dataset
            └── root
                β”œβ”€β”€ cameras
                β”‚   β”œβ”€β”€ az_camera1 (if you want to train our algorithm with DTTD v1)
                β”‚   β”œβ”€β”€ iphone14pro_camera1
                β”‚   └── ZED2 (to be released...)
                β”œβ”€β”€ data
                β”‚   β”œβ”€β”€ scene1
                β”‚   β”‚   └── data
                β”‚   β”‚   β”‚   β”œβ”€β”€ 00001_color.jpg
                β”‚   β”‚   β”‚   β”œβ”€β”€ 00001_depth.png
                β”‚   β”‚   β”‚   └── ...
                |   β”‚   └── scene_meta.yaml
                β”‚   β”œβ”€β”€ scene2
                β”‚   β”‚   └── data
                |   β”‚   └── scene_meta.yaml
                β”‚   ...
                └── objects
                    β”œβ”€β”€ apple
                    β”‚   β”œβ”€β”€ apple.mtl
                    β”‚   β”œβ”€β”€ apple.obj
                    β”‚   β”œβ”€β”€ front.xyz
                    β”‚   β”œβ”€β”€ points.xyz
                    β”‚   β”œβ”€β”€ ...
                    β”‚   └── textured.obj.mtl
                    β”œβ”€β”€ black_expo_marker
                    └── ...

Run Estimation:

This repository contains scripts for 6dof object pose estimation (end-to-end coarse estimation). To run estimation, please make sure you have installed all the dependencies.

Group 169

To run dttd-net (either training or evaluation), first download the dataset. It is recommended to create a soft link to dataset/dttd_iphone/ folder using:

ln -s <path to dataset>/DTTD_IPhone_Dataset ./dataset/dttd_iphone/

To run trained estimator with test dataset, move to ./eval/. For example, to evaluate on dttd v2 dataset:

cd eval/dttd_iphone/
bash eval.sh

You can customize your own eval command, for example:

python eval.py --dataset_root ./dataset/dttd_iphone/DTTD_IPhone_Dataset/root\
                --model ./checkpoints/m2p1.pth\
                --base_latent 256 --embed_dim 512 --fusion_block_num 1 --layer_num_m 2 --layer_num_p 1\
                --visualize --output eval_results_m8p4_model_filtered_best\

To load model with filter-enhanced MLP, please add flag --filter. To visualize the attention map or/and the reduced geometric embeddings' distribution, you can add flag --debug.

Eval:

This is the ToolBox that we used for the experiment result evaluation and comparison.

Train:

To run training of our method, you can use:

python train.py --device 0 \
    --dataset iphone --dataset_root ./dataset/dttd_iphone/DTTD_IPhone_Dataset/root --dataset_config ./dataset/dttd_iphone/dataset_config \
    --output_dir ./result/result \
    --base_latent 256 --embed_dim 512 --fusion_block_num 1 --layer_num_m 8 --layer_num_p 4 \
    --recon_w 0.3 --recon_choice depth \
    --loss adds --optim_batch 4 \
    --start_epoch 0 \
    --lr 1e-5 --min_lr 1e-6 --lr_rate 0.3 --decay_margin 0.033 --decay_rate 0.77 --nepoch 60 --warm_epoch 1 \
    --filter_enhance \

To train a smaller model, you can set flags --layer_num_m 2 --layer_num_p 1. To enable our method with depth robustifying modules, you can add flags --filter_enhance or/and --recon_choice model.

To adjust the weight of Chamfer Distance Loss to 0.5, you can set flags --reon_w 0.5.

Our model is applicable on YCBV_Dataset and DTTD_v1 as well, please try following commands to run training of our method with other datasets (please ensure you download the dataset that you want to train on):

python train.py --dataset ycb --output_dir ./result/train_result --device 0 --batch_size 1 --lr 8e-5 --min_lr 8e-6 --warm_epoch 1
python train.py --dataset dttd --output_dir ./result/train_result --device 0 --batch_size 1 --lr 1e-5 --min_lr 1e-6 --warm_epoch 1

About

[CVPRW 2025] Official repository of DTTDNet: Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •