By Bowen Zhang*, Yiji Cheng*, Chunyu Wang†, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, and Baining Guo.
Paper | Project Page | Code
RodinHD_realworld.mp4
We recommend using Anaconda to create a new environment and install the dependencies. Our code is tested with Python 3.8 on Linux. Our model is trained and can be inferred using NVIDIA V100 GPUs.
conda env create -n rodinhd python=3.8
conda create -n rodinhd python=3.9
conda activate rodinhd
pip install -r requirements.txt
cd Renderer
pip install -r requirements.txt
conda install mpi4py
Due to organization policy, the training data is not publicly available. You can prepare your own data following the instructions below. Your 3D dataset can be organized as follows:
data
├── obj_00
│ ├── img_proc_fg_000000.png
│ ├── img_proc_fg_000001.png
│ ├── ...
| ├── metadata_000000.json
| ├── metadata_000001.json
| ├── ...
├── obj_01
| ├── ...
Then encode the multi-scale vae features of the frontal images of each object using the following command:
cd scripts
python encode_multiscale_feature.py --root /path/to/data --output_dir /path/to/feature --txt_file /path/to/txt_file --start_idx 0 --end_idx 1000
Where --txt_file
is a txt file containing the list of objects to be encoded, and can be obtained by ls /path/to/data > /path/to/txt_file
.
Inference the base diffusion model:
cd scripts
sh base_sample.sh
Then inference the upsample diffusion model:
cd scripts
sh upsample_sample.sh
You need to modify the arguments in the scripts to fit your own data path.
We first fit the shared feature decoder with the proposed task-replay and identity-aware weight consolidation strategies using:
cd Renderer
sh fit_stage1.sh
If you want to support distributed single machine multi card training(for stage 1), run the following script:
CUDA_VISIBLE_DEVICES=2,3,4 sh fit_stage1_dist.sh
Then we fix the shared feature decoder and fit each triplane per object using:
sh fit_stage2.sh
You need to modify the arguments in the scripts to fit your own data path.
After fitting the triplanes, we train the diffusion model using:
sh ../scripts/base_train.sh
Then we train the upsample diffusion model using:
sh ../scripts/upsample_train.sh
You need to modify the arguments in the scripts to fit your own data path.
This repository is built upon improved-diffusion, torch-ngp and Rodin. Thanks for their great work!
If you find our work useful in your research, please consider citing:
@article{zhang2024rodinhd,
title={RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models},
author={Zhang, Bowen and Cheng, Yiji and Wang, Chunyu and Zhang, Ting and Yang, Jiaolong and Tang, Yansong and Zhao, Feng and Chen, Dong and Guo, Baining},
journal={arXiv preprint arXiv:2407.06938},
year={2024}
}