DTTDNet: Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset
This repository is the implementation code of the 2025 CVPR workshop (Mobile AI) paper "Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset"
Are current 3D object tracking methods truely robust enough for low-fidelity depth sensors like the iPhone LiDAR?
We introduce DTTD-Mobile, a new benchmark built on real-world data captured from mobile devices. We evaluate several popular methodsβincluding BundleSDF, ES6D, MegaPose, and DenseFusionβand highlight their limitations in this challenging setting. To go a step further, we propose DTTD-Net with a Fourier-enhanced MLP and a two-stage attention-based fusion across RGB and depth, making 6DoF pose estimation more robustβeven when the input is noisy, blurry, or partially occluded.
Dataset: Checkout DTTD-Mobile and the Robotics Dataset Extension.
- [05/01/25] The archival version of this work will be presented at 2025 CVPRW: Mobile AI.
- [11/05/24] We extended DTTD for specific grasping and insertion tasks using FANUC robotic arm, released here. Feel free to contact zixun@berkeley.edu and xiang_zhang_98@berkeley.edu for details on this dataset extension.
- [11/05/24] The DTTD-Mobile dataset has been migrated to huggingface due to our Google Drive storage issues, check here.
- [09/10/24] Our MoCap data pipeline has been released, check here (iPhone-ARKit-based version) for your customized data collection and annotation. For the release of our data capture app for iPhone, check here. For our previous released Azure-based version, check here.
- [06/17/24] Our work has been accepted at 2024 ICML workshop: Data-centric Machine Learning Research. demo video, openreview
- [09/28/23] Our trained checkpoints for pose estimator are released here.
- [09/27/23] Our dataset: DTTD-Mobile is released.
If our work is useful or relevant to your research, please kindly recognize our contributions by citing our papers:
@InProceedings{dttd2,
author = {Huang, Zixun and Yao, Keling and Zhao, Zhihao and Pan, Chuanyu and Yang, Allen},
title = {Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
month = {June},
year = {2025},
pages = {1848-1857}
}
@InProceedings{dttd1,
author = {Feng, Weiyu and Zhao, Seth Z. and Pan, Chuanyu and Chang, Adam and Chen, Yichen and Wang, Zekun and Yang, Allen Y.},
title = {Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2023},
pages = {3288-3297}
}
Before running our pose estimation pipeline, you can activate a conda environment where Python Version >= 3.8:
conda create --name [YOUR ENVIR NAME] python = [PYTHON VERSION]
conda activate [YOUR ENVIR NAME]
then install all necessary packages:
torch
torchvision
torchaudio
numpy
einops
pillow
scipy
opencv_python
tensorboard
tqdm
For knn module used in ADD-S loss, install KNN_CUDA: https://github.com/pptrick/KNN_CUDA. (Install KNN_CUDA requires CUDA environment, ensure that your CUDA version >= 10.2. Also, It only supports torch v1.0+.)
Download our checkpoints and datasets, then organize the file structure:
Robust-Digital-Twin-Tracking
βββ checkpoints
β βββ m8p4.pth
β βββ m8p4_filter.pth
β βββ ...
| ...
βββ dataset
βββ dttd_iphone
βββ dataset_config
βββ dataset.py
βββ DTTD_IPhone_Dataset
βββ root
βββ cameras
β βββ az_camera1 (if you want to train our algorithm with DTTD v1)
β βββ iphone14pro_camera1
β βββ ZED2 (to be released...)
βββ data
β βββ scene1
β β βββ data
β β β βββ 00001_color.jpg
β β β βββ 00001_depth.png
β β β βββ ...
| β βββ scene_meta.yaml
β βββ scene2
β β βββ data
| β βββ scene_meta.yaml
β ...
βββ objects
βββ apple
β βββ apple.mtl
β βββ apple.obj
β βββ front.xyz
β βββ points.xyz
β βββ ...
β βββ textured.obj.mtl
βββ black_expo_marker
βββ ...
This repository contains scripts for 6dof object pose estimation (end-to-end coarse estimation). To run estimation, please make sure you have installed all the dependencies.
To run dttd-net (either training or evaluation), first download the dataset. It is recommended to create a soft link to dataset/dttd_iphone/
folder using:
ln -s <path to dataset>/DTTD_IPhone_Dataset ./dataset/dttd_iphone/
To run trained estimator with test dataset, move to ./eval/
. For example, to evaluate on dttd v2 dataset:
cd eval/dttd_iphone/
bash eval.sh
You can customize your own eval command, for example:
python eval.py --dataset_root ./dataset/dttd_iphone/DTTD_IPhone_Dataset/root\
--model ./checkpoints/m2p1.pth\
--base_latent 256 --embed_dim 512 --fusion_block_num 1 --layer_num_m 2 --layer_num_p 1\
--visualize --output eval_results_m8p4_model_filtered_best\
To load model with filter-enhanced MLP, please add flag --filter
.
To visualize the attention map or/and the reduced geometric embeddings' distribution, you can add flag --debug
.
This is the ToolBox that we used for the experiment result evaluation and comparison.
To run training of our method, you can use:
python train.py --device 0 \
--dataset iphone --dataset_root ./dataset/dttd_iphone/DTTD_IPhone_Dataset/root --dataset_config ./dataset/dttd_iphone/dataset_config \
--output_dir ./result/result \
--base_latent 256 --embed_dim 512 --fusion_block_num 1 --layer_num_m 8 --layer_num_p 4 \
--recon_w 0.3 --recon_choice depth \
--loss adds --optim_batch 4 \
--start_epoch 0 \
--lr 1e-5 --min_lr 1e-6 --lr_rate 0.3 --decay_margin 0.033 --decay_rate 0.77 --nepoch 60 --warm_epoch 1 \
--filter_enhance \
To train a smaller model, you can set flags --layer_num_m 2 --layer_num_p 1
.
To enable our method with depth robustifying modules, you can add flags --filter_enhance
or/and --recon_choice model
.
To adjust the weight of Chamfer Distance Loss to 0.5, you can set flags --reon_w 0.5
.
Our model is applicable on YCBV_Dataset and DTTD_v1 as well, please try following commands to run training of our method with other datasets (please ensure you download the dataset that you want to train on):
python train.py --dataset ycb --output_dir ./result/train_result --device 0 --batch_size 1 --lr 8e-5 --min_lr 8e-6 --warm_epoch 1
python train.py --dataset dttd --output_dir ./result/train_result --device 0 --batch_size 1 --lr 1e-5 --min_lr 1e-6 --warm_epoch 1