|
| 1 | +<div align="center"> |
| 2 | + |
| 3 | +## PC^2 Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction |
| 4 | +### CVPR 2023 (Highlight) |
| 5 | + |
| 6 | +[](https://arxiv.org/abs/2302.10668) |
| 7 | +[](https://arxiv.org/abs/2302.10668) |
| 8 | +</div> |
| 9 | + |
| 10 | +## Table of Contents |
| 11 | + |
| 12 | +- [Overview](#overview) |
| 13 | + * [Explanatory Video](#explanatory-video) |
| 14 | + * [Code Overview](#code-overview) |
| 15 | + * [Abstract](#abstract) |
| 16 | + * [Examples](#examples) |
| 17 | + * [Method](#method) |
| 18 | +- [Running the code](#running-the-code) |
| 19 | + * [Dependencies](#dependencies) |
| 20 | + * [Data](#data) |
| 21 | + * [Training](#training) |
| 22 | + * [Sampling](#sampling) |
| 23 | + * [Pretrained checkpoints](#pretrained-checkpoints) |
| 24 | + * [Common issues](#common-issues) |
| 25 | +- [Acknowledgement](#acknowledgement) |
| 26 | +- [Citation](#citation) |
| 27 | + |
| 28 | +## Overview |
| 29 | + |
| 30 | +### Explanatory Video |
| 31 | + |
| 32 | +<div align="center"> <a href="https://www.youtube.com/watch?v=kAkwpsT1pRA"><img src="https://img.youtube.com/vi/kAkwpsT1pRA/0.jpg" alt="Explanatory Video"></a></div> |
| 33 | + |
| 34 | +### Code Overview |
| 35 | + |
| 36 | +This repository uses [PyTorch3D](https://github.com/facebookresearch/pytorch3d) for most 3D operations. It uses [Hydra](https://hydra.cc/docs/intro/) for configuration, and the config is located at `config/structured.py`. The entrypoints for training are `main.py` for the point cloud diffusion model and `main_coloring.py` for the point cloud coloring model. There are shared utilities in `diffusion_utils.py` and `training_utils.py`. The data is [Co3Dv2](https://github.com/facebookresearch/co3d). |
| 37 | + |
| 38 | +I substantially refactored the repository for the public release to use the `diffusers` library from HuggingFace. As a results, most of the code is different from the original code used for the paper. Only the Co3Dv2 dataset is implemented in this version of this code, but it should be easy to run on other datasets if you need to. |
| 39 | + |
| 40 | +If you have any questions or contributions, feel free to leave an issue or a pull request. |
| 41 | + |
| 42 | +### Abstract |
| 43 | + |
| 44 | +Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision. In this paper, we propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process. Our method takes as input a single RGB image along with its camera pose and gradually denoises a set of 3D points, whose positions are initially sampled randomly from a three-dimensional Gaussian distribution, into the shape of an object. The key to our method is a geometrically-consistent conditioning process which we call projection conditioning: at each step in the diffusion process, we project local image features onto the partially-denoised point cloud from the given camera pose. This projection conditioning process enables us to generate high-resolution sparse geometries that are well-aligned with the input image, and can additionally be used to predict point colors after shape reconstruction. Moreover, due to the probabilistic nature of the diffusion process, our method is naturally capable of generating multiple different shapes consistent with a single input image. In contrast to prior work, our approach not only performs well on synthetic benchmarks, but also gives large qualitative improvements on complex real-world data. |
| 45 | + |
| 46 | +### Examples |
| 47 | + |
| 48 | + |
| 49 | + |
| 50 | +### Method |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | + |
| 55 | +## Running the code |
| 56 | + |
| 57 | +### Dependencies |
| 58 | + |
| 59 | +Dependencies may be installed with pip: |
| 60 | +```bash |
| 61 | +pip install -r requirements.txt |
| 62 | +``` |
| 63 | + |
| 64 | +PyTorch and PyTorch3D are not included in `requirements.txt` because that sometimes messes up `conda` installations by trying to re-install PyTorch using `pip`. I assume you've already installed these by yourself. If not, you can use a command such as: |
| 65 | + |
| 66 | +```bash |
| 67 | +mamba install pytorch torchvision pytorch-cuda=11.7 pytorch3d -c pytorch -c nvidia -c pytorch3d |
| 68 | +``` |
| 69 | + |
| 70 | +### Data |
| 71 | + |
| 72 | +For our data, we use [Co3Dv2](https://github.com/facebookresearch/co3d). Full information about the dataset is provided on the GitHub page. |
| 73 | + |
| 74 | +We train on individual categories, so you can just download one category or a subset of the categories (for example hydrants or teddy bears). |
| 75 | + |
| 76 | +Then you can set the environment variable `CO3DV2_DATASET_ROOT` to the dataset root: |
| 77 | +```bash |
| 78 | +export CO3DV2_DATASET_ROOT="your_dataset_root_folder" |
| 79 | +``` |
| 80 | + |
| 81 | +### Training |
| 82 | + |
| 83 | +The config is in `config/structured.py`. |
| 84 | + |
| 85 | +You can specify your job mode using `run.job=train`, `run.job=train_coloring`, `run.job=sample`, or `run.job=sample_coloring`. By default, the mode is set to `train`. |
| 86 | + |
| 87 | +An example training command is: |
| 88 | +```bash |
| 89 | +python main.py dataset.category=hydrant dataloader.batch_size=24 dataloader.num_workers=8 run.vis_before_training=True run.val_before_training=True run.name=train__hydrant__ebs_24 |
| 90 | +``` |
| 91 | + |
| 92 | +To run multiple jobs in parallel on a SLURM cluster, you can use a script such as: |
| 93 | +```bash |
| 94 | +python scripts/example-slurm.py --partition ${PARTITION_NAME} --submit |
| 95 | +``` |
| 96 | + |
| 97 | +Separately, you can train a coloring model to predict the color of points with fixed locations in 3D space. |
| 98 | + |
| 99 | +An example command is: |
| 100 | +```bash |
| 101 | +python main_coloring.py run.job=train_coloing model=coloring_model run.mixed_precision=no dataset.category=hydrant dataloader.batch_size=24 run.max_steps=20_000 run.coloring_training_noise_std=0.1 run.name=train_coloring__hydrant__ebs_24 |
| 102 | +``` |
| 103 | + |
| 104 | +### Sampling |
| 105 | + |
| 106 | +For sampling point clouds, use `run.job=sample`. |
| 107 | + |
| 108 | +For example: |
| 109 | +```bash |
| 110 | +python main.py run.job=sample dataloader.batch_size=16 dataloader.num_workers=6 dataset.category=hydrant checkpoint.resume="/path/to/checkpoint/like/train__hydrant__ebs_24/2022-11-01--17-04-36/checkpoint-latest.pth" run.name=sample__hydrant__ebs_24 |
| 111 | +``` |
| 112 | + |
| 113 | +Results will be saved to your output directory. |
| 114 | + |
| 115 | +Afterwards, you can predict colors using the point clouds obtained from the sampling procedure above, specifying them with the argument `run.coloring_sample_dir`. |
| 116 | + |
| 117 | +For example: |
| 118 | +```bash |
| 119 | +python main_coloring.py run.job=sample_coloing dataset.category=hydrant dataloader.batch_size=8 model=coloring_model checkpoint.resume="/path/to/coloring/model/checkpoint-latest.pth" run.coloring_sample_dir="/path/to/sample/dir/like/sample__hydrant__ebs_24/2022-09-22--18-03-20/sample/" run.name=sample_coloring__hydrant__ebs_24 |
| 120 | +``` |
| 121 | + |
| 122 | +_Side note:_ although this is called "`sample_coloring`" in the code, it is not really doing any sampling because the coloring model is deterministic. |
| 123 | + |
| 124 | +### Pretrained checkpoints |
| 125 | + |
| 126 | +You can download example checkpoints here: |
| 127 | +```bash |
| 128 | +# Downloads checkpoint and logs (1.2G) |
| 129 | +bash ./scripts/download-example-logs-and-checkpoints.sh |
| 130 | +# Downloads visualizations over the course of training, as an example. Since |
| 131 | +# these are large (3.5G), we have made them a separate download. |
| 132 | +bash ./scripts/download-example-vis.sh |
| 133 | +``` |
| 134 | +These are newly-trained models with this codebase. We can train and upload models for other categories as well if you would like; just let us know. |
| 135 | + |
| 136 | +### Common issues |
| 137 | + |
| 138 | +(1) If you get an error of the form `Error building extension '_pvcnn_backend'`, make sure you have installed `gcc` and `g++`. Then check the path in `model/pvcnn/modules/functional/backend.py` and edit it to your desired location. |
| 139 | + |
| 140 | +(2) I believe PyTorch3D has some large changes recently and it is possible some of their code is now broken. I am using version 0.7.3 with a patch on line 634 of `pytorch3d/implicitron/dataset/frame_data.py`. |
| 141 | +```python |
| 142 | +image_rgb = torch.from_numpy(load_image(self._local_path(path))) |
| 143 | +``` |
| 144 | + |
| 145 | +(3) You may also have to patch the `accelerate` library in order to properly batch the `FrameData` objects from PyTorch3D. To fix this I replaced the following lines in `accelerate/utils/operations.py` (L91-99) |
| 146 | +```python |
| 147 | +elif isinstance(data, Mapping): |
| 148 | + return type(data)( |
| 149 | + { |
| 150 | + k: recursively_apply( |
| 151 | + func, v, *args, test_type=test_type, error_on_other_type=error_on_other_type, **kwargs |
| 152 | + ) |
| 153 | + for k, v in data.items() |
| 154 | + } |
| 155 | + ) |
| 156 | +``` |
| 157 | +with the following lines |
| 158 | +```python |
| 159 | +elif isinstance(data, Mapping): |
| 160 | + from pytorch3d.implicitron.dataset.data_loader_map_provider import FrameData |
| 161 | + if isinstance(data, (FrameData)): |
| 162 | + return type(data)( |
| 163 | + **{ |
| 164 | + k: recursively_apply( |
| 165 | + func, v, *args, test_type=test_type, error_on_other_type=error_on_other_type, **kwargs |
| 166 | + ) |
| 167 | + for k, v in data.items() |
| 168 | + } |
| 169 | + ) |
| 170 | + else: |
| 171 | + return type(data)( |
| 172 | + { |
| 173 | + k: recursively_apply( |
| 174 | + func, v, *args, test_type=test_type, error_on_other_type=error_on_other_type, **kwargs |
| 175 | + ) |
| 176 | + for k, v in data.items() |
| 177 | + } |
| 178 | + ) |
| 179 | +``` |
| 180 | + |
| 181 | +## Acknowledgement |
| 182 | + |
| 183 | +* The [PyTorch3D](https://github.com/facebookresearch/pytorch3d) library. |
| 184 | +* The [diffusers](https://github.com/huggingface/diffusers) library. |
| 185 | +* The [Co3D and Co3Dv2](https://github.com/facebookresearch/co3d) datasets. |
| 186 | +* _Our funding:_ Luke Melas-Kyriazi is supported by the Rhodes Trust. Andrea Vedaldi and Christian Rupprecht are supported by ERC-UNION-CoG-101001212. Christian Rupprecht is also supported by VisualAI EP/T028572/1. |
| 187 | + |
| 188 | +## Citation |
| 189 | +``` |
| 190 | +@misc{melaskyriazi2023projection, |
| 191 | + doi = {10.48550/ARXIV.2302.10668}, |
| 192 | + url = {https://arxiv.org/abs/2302.10668}, |
| 193 | + author = {Melas-Kyriazi, Luke and Rupprecht, Christian and Vedaldi, Andrea}, |
| 194 | + title = {PC^2 Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction}, |
| 195 | + publisher = {arXiv}, |
| 196 | + year = {2023}, |
| 197 | +} |
| 198 | +``` |
0 commit comments