BOOTPLACE is a paradigm that formulates object placement as a placement-by-detection problem. It begins by identifying suitable regions of interest for object placement. This is achieved by training a specialized detection transformer on object-subtracted backgrounds, enhanced with multi-object supervisions. It then semantically associates each target compositing object with detected regions based on their complementary characteristics. Through a boostrapped training approach applied to randomly object-subtracted images, it enforces meaningful placements through extensive paired data augmentation.
Check out our Project Page for more visual demos!
03/20/2025
- Release training code and pretrained models.
06/24/2025
- Release inference code and data.
-
System: The code is currently tested only on Linux.
-
Hardware: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A6000 GPUs.
-
Software:
- Conda is recommended for managing dependencies.
- Python version 3.6 or higher is required.
Create a new conda environment named
BOOTPLACE
and install the dependencies:conda env create --file=BOOTPLACE.yml
Download DETR-R50 pretrained models for finetuning here and put it in the directory
weights/detr-r50-e632da11.pth
.
We provide the following pretrained models:
Model | Description | #Params | Download |
---|---|---|---|
BOOTPLACE_Cityscapes | Multiple supervision | 523M | Download |
We provide a large-scale street-scene vehicle placement dataset Download curated from Cityscapes. The file structures are:
├── train
├── backgrounds:
├── imgID.png
├── ……
├── objects:
├── imgID:
├── object_name_ID.png
├── ……
├── ……
├── location:
├── imgID:
├── object_name_ID.txt
├── ……
├── ……
├── annotations.json
├── test
├── backgrounds:
├── imgID.png
├── ……
|── backgrounds_single
├── imgID.png
├── ……
├── objects:
├── imgID:
├── object_name_ID.png
├── ……
├── ……
├── objects_single:
├── imgID:
├── object_name_ID.png
├── ……
├── ……
├── location:
├── imgID:
├── object_name_ID.txt
├── ……
├── ……
├── location_single:
├── imgID:
├── object_name_ID.txt
├── ……
├── ……
├── annotations.json
To train a model on Cityscapes:
python -m main \
--epochs 200 \
--batch_size 2 \
--save_freq 10 \
--set_cost_class 1 \
--ce_loss_coef 1 \
--num_queries 120 \
--eos_coef 0.1 \
--lr 1e-4 \
--data_path data/Cityscapes \
--output_dir results/Cityscapes_ckpt \
--resume weights/detr-r50-e632da11.pth
python test.py \
--num_queries 120 \
--data_path data/Cityscapes \
--pretrained_model 'results/Cityscapes_ckpt/checkpoint.pth' \
--im_root 'data/Cityscapes/test' \
--output_dir 'results/Cityscape_inference'
This project is licensed under the terms of the MIT license.
If you find this work helpful, please consider citing our paper:
@inproceedings{zhou2025bootplace,
title={BOOTPLACE: Bootstrapped Object Placement with Detection Transformers},
author={Zhou, Hang and Zuo, Xinxin and Ma, Rui and Cheng, Li},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={19294--19303},
year={2025}
}