- 🔥2025.10.10: Release DreamOmni2 editing demo and generation demo
- 🔥2025.10.10: Release DreamOmni2 Benchmark.
- 🔥2025.10.10: Release DreamOmni2's codes and models.
- 🔥2025.10.09: Release DreamOmni2 tech report.
(1) Multimodal Instruction-based Generation
For traditional subject-driven generation based on concrete objects, DreamOmni2 achieves the best results among open-source models, showing superior identity and pose consistency. Additionally, DreamOmni2 can reference abstract attributes (such as material, texture, makeup, hairstyle, posture, design style, artistic style, etc.), even surpassing commercial models in this area.
(2) Multimodal Instruction-based Editing
Beyond traditional instruction-based editing models, DreamOmni2 supports multimodal instruction editing. In everyday editing tasks, there are often elements that are difficult to describe purely with language and require reference images. Our model addresses this need, supporting references to any concrete objects and abstract attributes, with performance comparable to commercial models.
(3) Unified Generation and Editing Model
Building upon these two new tasks, we introduce DreamOmni2, which is capable of multimodal instruction-based editing and generation under any concrete or abstract concept guidance. Overall, DreamOmni2 is a more intelligent and powerful open-sourced unified generation and editing model, offering enhanced capabilities across a wide range of tasks.
Editing and generation are distinct tasks. Editing requires strict consistency in preserving the non-edited areas of the source image, while generation only needs to retain the ID, IP, or attribution from the reference image as per the instructions, allowing the entire image to be regenerated with a focus on aesthetics. We’ve found that the instructions for generation and editing are often similar, so we’ve separated these two tasks to make it easier for users to choose the appropriate task type.
First, install the necessary dependencies:
git clone https://github.com/dvlab-research/DreamOmni2
cd ./DreamOmni2
pip install -r requirements.txt
Next, download the DreamOmni2 weights into the models folder.
huggingface-cli download --resume-download --local-dir-use-symlinks False xiabs/DreamOmni2 --local-dir ./models
Multimodal Instriction-based Editing
Notably, for editing tasks, due to the format settings of the training data, we need to place the image to be edited in the first position.
python3 /mnt/bn/unifygen/xiabin_dev/iclr2026/DreamOmni2/inference_edit.py \
--input_img_path "example_input/edit_tests/src.jpg" "example_input/edit_tests/ref.jpg" \
--input_instruction "Make the woman from the second image stand on the road in the first image." \
--output_path "example_input/edit_tests/edit_res.png"
Multimodal Instriction-based Generation
python3 /mnt/bn/unifygen/xiabin_dev/iclr2026/DreamOmni2/inference_gen.py \
--input_img_path "example_input/gen_tests/img1.jpg" "example_input/gen_tests/img2.jpg" \
--input_instruction "In the scene, the character from the first image stands on the left, and the character from the second image stands on the right. They are shaking hands against the backdrop of a spaceship interior." \
--output_path "example_input/gen_tests/gen_res.png" \
--height 1024 \
--width 1024
CUDA_VISIBLE_DEVICES=0 python web_edit.py \
--vlm_path PATH_TO_VLM \
--edit_lora_path PATH_TO_DEIT_LORA \
--server_name "0.0.0.0" \
--server_port 7860
CUDA_VISIBLE_DEVICES=1 python web_generate.py \
--vlm_path PATH_TO_VLM \
--gen_lora_path PATH_TO_GENERATION_LORA \
--server_name "0.0.0.0" \
--server_port 7861
This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.
If DreamOmni2 is helpful, please help to ⭐ the repo.
If you find this project useful for your research, please consider citing our paper.
If you have any comments or questions, please open a new issue or contact Bin Xia.