Skip to content

This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''

License

Notifications You must be signed in to change notification settings

dvlab-research/DreamOmni2

Repository files navigation

DreamOmni2: Multimodal Instruction-based Editing and Generation

Build Project Page Video Demo Build Build Build Build Build

🔥 News

Introduction

(1) Multimodal Instruction-based Generation

For traditional subject-driven generation based on concrete objects, DreamOmni2 achieves the best results among open-source models, showing superior identity and pose consistency. Additionally, DreamOmni2 can reference abstract attributes (such as material, texture, makeup, hairstyle, posture, design style, artistic style, etc.), even surpassing commercial models in this area.

(2) Multimodal Instruction-based Editing

Beyond traditional instruction-based editing models, DreamOmni2 supports multimodal instruction editing. In everyday editing tasks, there are often elements that are difficult to describe purely with language and require reference images. Our model addresses this need, supporting references to any concrete objects and abstract attributes, with performance comparable to commercial models.

(3) Unified Generation and Editing Model

Building upon these two new tasks, we introduce DreamOmni2, which is capable of multimodal instruction-based editing and generation under any concrete or abstract concept guidance. Overall, DreamOmni2 is a more intelligent and powerful open-sourced unified generation and editing model, offering enhanced capabilities across a wide range of tasks.

Editing and Generation Model?

Editing and generation are distinct tasks. Editing requires strict consistency in preserving the non-edited areas of the source image, while generation only needs to retain the ID, IP, or attribution from the reference image as per the instructions, allowing the entire image to be regenerated with a focus on aesthetics. We’ve found that the instructions for generation and editing are often similar, so we’ve separated these two tasks to make it easier for users to choose the appropriate task type.

Quick Start

Requirements and Installation

First, install the necessary dependencies:

git clone https://github.com/dvlab-research/DreamOmni2
cd ./DreamOmni2
pip install -r requirements.txt

Next, download the DreamOmni2 weights into the models folder.

huggingface-cli download --resume-download --local-dir-use-symlinks False xiabs/DreamOmni2 --local-dir ./models

Inference

Multimodal Instriction-based Editing

Notably, for editing tasks, due to the format settings of the training data, we need to place the image to be edited in the first position.

python3 /mnt/bn/unifygen/xiabin_dev/iclr2026/DreamOmni2/inference_edit.py \
    --input_img_path "example_input/edit_tests/src.jpg" "example_input/edit_tests/ref.jpg" \
    --input_instruction "Make the woman from the second image stand on the road in the first image." \
    --output_path "example_input/edit_tests/edit_res.png"

Multimodal Instriction-based Generation

python3 /mnt/bn/unifygen/xiabin_dev/iclr2026/DreamOmni2/inference_gen.py \
    --input_img_path "example_input/gen_tests/img1.jpg" "example_input/gen_tests/img2.jpg" \
    --input_instruction "In the scene, the character from the first image stands on the left, and the character from the second image stands on the right. They are shaking hands against the backdrop of a spaceship interior." \
    --output_path "example_input/gen_tests/gen_res.png" \
    --height 1024 \
    --width 1024

Web Demo

CUDA_VISIBLE_DEVICES=0 python web_edit.py \
    --vlm_path PATH_TO_VLM \
    --edit_lora_path PATH_TO_DEIT_LORA \
    --server_name "0.0.0.0" \
    --server_port 7860


CUDA_VISIBLE_DEVICES=1 python web_generate.py \
    --vlm_path PATH_TO_VLM \
    --gen_lora_path PATH_TO_GENERATION_LORA \
    --server_name "0.0.0.0" \
    --server_port 7861

Disclaimer

This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

Citation

If DreamOmni2 is helpful, please help to ⭐ the repo.

If you find this project useful for your research, please consider citing our paper.

Contact

If you have any comments or questions, please open a new issue or contact Bin Xia.

About

This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages