Skip to content

Alpha-VLLM/Lumina-Accessory

Repository files navigation

Lumina-Accessory: Instruction Fine-tuned Rectified Flow Transformer for Universial Image Generation

Lumina-Next  Badge  Static Badge


✨ Features

Lumina-Accessory is a multi-task instruction fine-tuning framework designed for Lumina series (currently supporting Lumina-Image-2.0). This repository includes:

  • 🧠 Tuning Code – Unifies various image-to-image tasks in a sequence concatenation manner, supporting both universal and task-specific model tuning.

  • ⚖️ Instruction Fine-tuned Universal Model Weights – Initialized from Lumina-Image-2.0, supporting:

    • 🖼️ Spatial conditional generation
    • 🔧 Infilling & Restoration
    • 💡 Relighting
    • 🎨 Subject-driven generation
    • ✏️ Instruction-based editing
  • 🚀 Inference Code & Gradio Demo – Test and showcase the universal model’s capabilities interactively!

📰 News

  • [2025-4-21] 🚀🚀🚀 We are excited to release Lumina-Accessory, including:
    • 🎯 Checkpoints, Fine-Tuning and Inference code.

📑 Open-source Plan

  • Tuning code
  • Inference Code
  • Checkpoints
  • Web Demo (Gradio)

🏠 Architecture

✨ Lumina-Accessory directly leverages the self-attention mechanism in DiT to perform interaction between condition and target image tokens, consistent with approaches such as OminiControl, DSD, VisualCloze, etc.

✨ Built on top of Lumina-Image-2.0, Lumina-Accessory introduces an additional condition processor, initialized with the weights of the latent processor.

✨ Similar to OminiControl, we modulate both condition and target image tokens with different time conditions, and apply distinct positional embeddings for different types of conditions.


🎮 Model Zoo

Resolution Parameter Text Encoder VAE Download URL
1024 2.6B Gemma-2-2B FLUX-VAE-16CH hugging face

📊 Model Capability

Task Type Training Data Model Ability
Spatial Conditional Generation Internal Data 😄 (Good)
Infilling & Restoration Internal Data 😄 (Good)
Relighting IC-Light Synthetic Data 😊 (Moderate)
Subject-Driven Generation Subject200K 😐 (Basic)
Instruction-Based Editing OmniEdit-1.2M 😐 (Basic)

💻 Finetuning Code

1. Create a conda environment and install PyTorch

conda create -n Lumina2 -y
conda activate Lumina2
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y

2.Install dependencies

pip install -r requirements.txt

3. Install flash-attn

pip install flash-attn --no-build-isolation

4. Prepare data

You can place the links to your data files in ./configs/data.yaml.

For tasks where the condition can be generated online, your image-text pair training data format should adhere to the following:

{
    "image_path": "path/to/your/image",
    "prompt": "a description of the image"
}

For tasks that require loading a condition image, the training data format should be as follows:

{
    "input_image": "path/to/your/condition",
    "output_image": "path/to/your/target",
    "prompt": "a description of the image"
}

5. Start finetuning

bash scripts/run_1024_finetune.sh

🚀 Inference Code

We support multiple solvers including Midpoint Solver, Euler Solver, and DPM Solver for inference.

Note

Both the Gradio demo and the direct inference method use the .pth format weight file, which can be downloaded from huggingface. We have uploaded the .pth weight files, and you can simply specify the --ckpt argument as the download directory.

Note

The code has just been cleaned up, if there are any issues please let us know.

  • Direct Inference
NUM_STEPS=50
CFG_SCALE=4.0
TIME_SHIFTING_FACTOR=6
SEED=20
SOLVER=euler
TASK_TYPE="Image Infilling"
CAP_DIR=./examples/caption_list.json
OUT_DIR=./examples/outputs
MODEL_CHECKPOINT=/path/to/your/ckpt

python -u sample_accessory.py --ckpt ${MODEL_CHECKPOINT} \
--image_save_path ${OUT_DIR} \
--solver ${SOLVER} \
--num_sampling_steps ${STEPS} \
--caption_path ${CAP_DIR} \
--seed ${SEED} \
--time_shifting_factor ${TIME_SHIFTING_FACTOR} \
--cfg_scale ${CFG_SCALE} \
--batch_size 1 \
--rank 0 \
--task_type "${TASK_TYPE}"
  • Gradio Demo
PRECISION="bf16" 
SOLVER="euler"
VAE="flux"
SHARE=False
MODEL_CHECKPOINT=/path/to/your/ckpt

torchrun --nproc_per_node=1 --master_port=18187 gradio_demo.py \
  --ckpt "$MODEL_CHECKPOINT" \
  --precision "$PRECISION" \
  --solver "$SOLVER" \
  --vae "$VAE" \
  --share "$SHARE"


Citation

If you find the provided code or models useful for your research, consider citing them as:

@Misc{lumina-accessory,
  author = {Alpha-VLLM Team},
  title  = {Lumina-Accessory GitHub Page},
  year   = {2025},
}

Related Work

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

OminiControl: Minimal and Universal Control for Diffusion Transformer

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

OminiControl2: Efficient Conditioning for Diffusion Transformers

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published