Skip to content

[CVPR 2025, Highlight] The official implementation of the paper "Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation"

Notifications You must be signed in to change notification settings

BolinLai/InstaManip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

CVPR 2025 (Highlight)

Contents

Setup

Environment

conda env create -f environment.yaml  # The env name is "instamanip"

Dataset

Download the dataset collected in the work InstructPix2Pix. Unzip all the 30 zip files into the path ./data/ip2p/.

Pre-trained Checkpoints

Download the following pre-trained checkpoints and save them under ./pretrained.

Move cvlm_llama2_tokenizer_100img_and_224loc_addpatch, seed_detokenizer and seed_x from SEED-X-17B to ./pretrained.

Replace the added_tokens.json under cvlm_llama2_tokenizer_100img_and_224loc_addpatch with our released json file in ./pretrained.

mv ./pretrained/added_tokens.json ./pretrained/cvlm_llama2_tokenizer_100img_and_224loc_addpatch/

Please run the following script to save the weights of visual encoder of Qwen-VL-Chat to ./pretrained/QwenViT.

python src/tools/reload_qwen_vit.py

Finally, you should have the following directories under ./pretrained. We don't need the other files.

./pretrained
     |
     |- QwenViT
     |- cvlm_llama2_tokenizer_100img_and_224loc_addpatch
     |- seed_detokenizer
     |- seed_x
     |- stable-diffusion-xl-base-1.0

Model Weights

Our model weights are available on HuggingFace. There are four models released in this repo.

  • InstaManip-17B-1shot: model trained specifically for 1-shot image manipulation.
  • InstaManip-17B-2shot: model trained specifically for 2-shot image manipulation.
  • InstaManip-17B-3shot: model trained specifically for 3-shot image manipulation.
  • InstaManip-17B-dynamic: model trained for arbitrary amount of exemplar image pairs.

Quick Start

We provide a few examples in ./demo for a quick start of our model. After setting up the environment and downloading all pre-trained checkpoints and our model weight, run the following command to edit a given image.

# 1-shot
python src/inference/run_model.py --ckpt ./train_output/your_path/checkpoint-xxxx/pytorch_model.bin

# multi-shot
python src/inference/run_model_multishot.py --ckpt ./train_output/your_path/checkpoint-xxxx/pytorch_model.bin

You can try different examples or use your own image by updating source_image_path, exemplar_source_image_path, exemplar_target_image_path and instruction in src/inference/run_model.py and src/inference/run_model_multishot.py.

Training

Run the following command to train the model on 8 GPUs. You can change the number of GPUs by updating --nproc_per_node in train.sh.

bash scripts/train.sh

You can use different hyperparameters in scripts/train.sh (e.g., learning rate, iterateions) and configs/data/dataset.yaml (e.g., batch size, number of exemplar images).

We also enable torch.multiprocessing.set_start_method("spawn") in scripts/train.sh for training on H100. If you run the code on A100, this line can be commented out for faster training.

Evaluation

Go to the checkpont directory that you want to evaluate. Convert the model weights.

python zero_to_fp32.py . ./pytorch_model.bin

Go back to the project root directory and run the following commands. The inference results will be saved in checkpoint-xxxx/inference-xxxx-xx.

Using one pair of exemplar images (1-shot):

# In distribution
python src/inference/eval_model.py --ckpt ./train_output/your_path/checkpoint-xxxx/pytorch_model.bin --setting in_dist

# Out of distribution
python src/inference/eval_model.py --ckpt ./train_output/your_path/checkpoint-xxxx/pytorch_model.bin --setting out_of_dist

Using multiple exemplar images (few-shot):

# In distribution
python src/inference/eval_model_multishot.py --ckpt ./train_output/your_path/checkpoint-xxxx/pytorch_model.bin --example_num 2 --setting in_dist

# Out of distribution
python src/inference/eval_model_multishot.py --ckpt ./train_output/your_path/checkpoint-xxxx/pytorch_model.bin --example_num 2 --setting out_of_dist

# Out of distribution (diverse)
python src/inference/eval_model_multishot.py --ckpt ./train_output/your_path/checkpoint-xxxx/pytorch_model.bin --example_num 2 --setting out_of_dist_diverse

Most instructions have 3-4 instances in the dataset of IP2P. The model will use duplicate exemplar images if example_num is set above the available instances.

Metrics

Run the following command.

python src/metrics/metrics.py  --gen_path ./train_output/your_path/checkpoint-xxxx/inference-xxxx-xx

BibTex

If you find our paper helpful to your work, please cite with this BibTex.

@inproceedings{lai2025unleashing,
  title={Unleashing in-context learning of autoregressive models for few-shot image manipulation},
  author={Lai, Bolin and Juefei-Xu, Felix and Liu, Miao and Dai, Xiaoliang and Mehta, Nikhil and Zhu, Chenguang and Huang, Zeyi and Rehg, James M and Lee, Sangmin and Zhang, Ning and Xiao, Tong},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={18346--18357},
  year={2025}
}

Acknowledgement

Our work was developed based on SEED-X. We appreciate the contributors for their awesome codebase.

About

[CVPR 2025, Highlight] The official implementation of the paper "Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published