Lumina-Accessory is a multi-task instruction fine-tuning framework designed for Lumina series (currently supporting Lumina-Image-2.0). This repository includes:
-
🧠 Tuning Code – Unifies various image-to-image tasks in a sequence concatenation manner, supporting both universal and task-specific model tuning.
-
⚖️ Instruction Fine-tuned Universal Model Weights – Initialized from Lumina-Image-2.0, supporting:
- 🖼️ Spatial conditional generation
- 🔧 Infilling & Restoration
- 💡 Relighting
- 🎨 Subject-driven generation
- ✏️ Instruction-based editing
-
🚀 Inference Code & Gradio Demo – Test and showcase the universal model’s capabilities interactively!
- [2025-4-21] 🚀🚀🚀 We are excited to release
Lumina-Accessory
, including:- 🎯 Checkpoints, Fine-Tuning and Inference code.
- Tuning code
- Inference Code
- Checkpoints
- Web Demo (Gradio)
✨ Lumina-Accessory directly leverages the self-attention mechanism in DiT to perform interaction between condition and target image tokens, consistent with approaches such as OminiControl, DSD, VisualCloze, etc.
✨ Built on top of Lumina-Image-2.0, Lumina-Accessory introduces an additional condition processor, initialized with the weights of the latent processor.
✨ Similar to OminiControl, we modulate both condition and target image tokens with different time conditions, and apply distinct positional embeddings for different types of conditions.
Resolution | Parameter | Text Encoder | VAE | Download URL |
---|---|---|---|---|
1024 | 2.6B | Gemma-2-2B | FLUX-VAE-16CH | hugging face |
Task Type | Training Data | Model Ability |
---|---|---|
Spatial Conditional Generation | Internal Data | 😄 (Good) |
Infilling & Restoration | Internal Data | 😄 (Good) |
Relighting | IC-Light Synthetic Data | 😊 (Moderate) |
Subject-Driven Generation | Subject200K | 😐 (Basic) |
Instruction-Based Editing | OmniEdit-1.2M | 😐 (Basic) |
conda create -n Lumina2 -y
conda activate Lumina2
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
You can place the links to your data files in ./configs/data.yaml
.
For tasks where the condition can be generated online, your image-text pair training data format should adhere to the following:
{
"image_path": "path/to/your/image",
"prompt": "a description of the image"
}
For tasks that require loading a condition image, the training data format should be as follows:
{
"input_image": "path/to/your/condition",
"output_image": "path/to/your/target",
"prompt": "a description of the image"
}
bash scripts/run_1024_finetune.sh
We support multiple solvers including Midpoint Solver, Euler Solver, and DPM Solver for inference.
Note
Both the Gradio demo and the direct inference method use the .pth format weight file, which can be downloaded from huggingface. We have uploaded the .pth weight files, and you can simply specify the --ckpt
argument as the download directory.
Note
The code has just been cleaned up, if there are any issues please let us know.
- Direct Inference
NUM_STEPS=50
CFG_SCALE=4.0
TIME_SHIFTING_FACTOR=6
SEED=20
SOLVER=euler
TASK_TYPE="Image Infilling"
CAP_DIR=./examples/caption_list.json
OUT_DIR=./examples/outputs
MODEL_CHECKPOINT=/path/to/your/ckpt
python -u sample_accessory.py --ckpt ${MODEL_CHECKPOINT} \
--image_save_path ${OUT_DIR} \
--solver ${SOLVER} \
--num_sampling_steps ${STEPS} \
--caption_path ${CAP_DIR} \
--seed ${SEED} \
--time_shifting_factor ${TIME_SHIFTING_FACTOR} \
--cfg_scale ${CFG_SCALE} \
--batch_size 1 \
--rank 0 \
--task_type "${TASK_TYPE}"
- Gradio Demo
PRECISION="bf16"
SOLVER="euler"
VAE="flux"
SHARE=False
MODEL_CHECKPOINT=/path/to/your/ckpt
torchrun --nproc_per_node=1 --master_port=18187 gradio_demo.py \
--ckpt "$MODEL_CHECKPOINT" \
--precision "$PRECISION" \
--solver "$SOLVER" \
--vae "$VAE" \
--share "$SHARE"
If you find the provided code or models useful for your research, consider citing them as:
@Misc{lumina-accessory,
author = {Alpha-VLLM Team},
title = {Lumina-Accessory GitHub Page},
year = {2025},
}
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
OminiControl: Minimal and Universal Control for Diffusion Transformer
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
OminiControl2: Efficient Conditioning for Diffusion Transformers