Flux Omini Kontext framework for multi-image reference: training and inference

Updates

Qwen-Image-Edit support added. Spatial character insertion model available on HF(link below). this is experimental model, and not yet trained on large dataset. It sometimes insert duplicate characters.
This repository includes nodes and patches for ComfyUI that are compatible with the Nunchaku extension. So you can use nunchaku version of Flux Kontext model with Omini Kontext at lightning speed.

🚀 Live Demo

demo.mp4

Replicate version: https://replicate.com/thefluxtrain/omini-kontext

Installation guides, workflows, turorials, and demos

If you have trained your own model, you can use it on replicate. You will have to upload the model on HF, and enter the details on replicate.

OminiKontext is a framework built around Flux.1-Kontext-dev model. We do not alter the model architecture, but rather we play around with 3D RoPE embeddings to enable reference based edits on a given image.

The approach is heavily inspired from OminiControl project, that uses the same RoPE embeddings trick to achieve reference based image generation using Flux.1 dev model. However, Flux.1 dev uses 2D RoPE embeddings, where as Kontext uses 3D RoPE embeddings.

More details on delta values-

Read this issue - #12

🎨 Generated Samples

Spatial Character Insertion

The following examples demonstrate how the trained model can insert cartoon characters into existing scenes. It takes a reference character with the desired position of character in an white image, and a scene image. The model will insert the character into the scene image at the desired position. The model takes a some freedom to place the character, around the desired position (not exactly at the desired position), based on the feasibility (common sense) of image.

I used 30 image pair to train the model for intuitive blending task. Sometimes the results are not good, but overall the model is able to blend the character into the scene. This more like a POC. I plan to train another model(obviously opensource) with much larger dataset to make it more robust.

Scene	Reference Character	Generated Result

📋 Click to expand/collapse more spatial character insertion examples

Scene	Reference Character	Generated Result

Non-spatial Character Insertion

The following examples demonstrate how the trained model can insert cartoon characters into existing scenes. There is not spatial control over the character in this case.

Scene	Reference Character	Generated Result

More comming soon!

Model Comparison

The following table compares the performance of Omini Kontext model with a character insertion LoRA, against the vanilla FLUX.1-Kontext-dev model. For the comparision, we used Add character to the image. The character is scared. as the prompt for all the images.

Scene	Reference	Vanilla	Omini

📋 Click to expand/collapse more model comparison examples

Scene	Reference	Vanilla	Omini

Pretrained LoRA models

Model Name	Delta	Description
character_3000.safetensors	[0,0,96]	On comfyUI, use cfg=1.5 , LoRA strength = 0.5-0.7. Character in white background.
spatial-character-test.safetensors	[1,0,0]	On comfyUI, use cfg=1.5 , LoRA strength = 0.5-0.7 . Upload the reference image of same size as the base image, with character placed corresponding to where you want it to be.
product_2000.safetensors	[0,0,96]	On comfyUI, use cfg=1.5 , LoRA strength = 0.5-0.7. Product in white background.

Notes:

Sometimes, the product or characters are too big compared to the rest of the scene, simply use a smaller resolution image.
Trained on 512x512 reference image, but works fine with 1024x1024

📋 To-do

Model training Plans

Qwen-Image-Edit character model: Train character models for Qwen-Image-Edit.
Person Models: Develop models for realistic human subjects
Clothes Models: Create models for clothing and fashion items
Subject Models: Train models for specific objects and items
Character Models: Train specialized models for anime/cartoon characters

🚀 Quick Start

Setup Environment

# Create conda environment
conda create -n omini-kontext python=3.10
conda activate omini-kontext

# Install dependencies
pip install -r requirements.txt

Basic Training

📦 Installation

Prerequisites

Python 3.8+
CUDA-compatible GPU (recommended: 24GB+ VRAM)
PyTorch 2.0+
HuggingFace account for model access

Install Dependencies

# Core requirements
pip install torch>=2.0.0 lightning>=2.0.0

# Install diffusers from GitHub (required for FluxKontext pipeline)
pip install git+https://github.com/huggingface/diffusers

# Training-specific requirements
pip install -r requirements.txt

Verify Installation

import torch
from src.pipeline_flux_omini_kontext import FluxOminiKontextPipeline

# Test pipeline loading
pipe = FluxOminiKontextPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev"
)
print("✅ Installation successful!")

🎯 Usage

Basic Inference

from diffusers.utils import load_image
from src.pipeline_flux_omini_kontext import FluxOminiKontextPipeline
import torch

# Load pipeline
pipe = FluxOminiKontextPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

# Load images
input_image = load_image("path/to/input.jpg")
reference_image = load_image("path/to/reference.jpg")

# Load Character OminiKontext LoRA
pipe.load_lora_weights(
    "saquiboye/omini-kontext-character",
    weight_name="character_3000.safetensors",
    adapter_name="lora_weights"
)

# Generate
result = pipe(
    image=input_image,
    reference=reference_image,
    reference_delta=[0, 0, 96],  # Position delta for reference
    prompt="A beautiful landscape with mountains",
    guidance_scale=3.5,
    num_inference_steps=28
)

# Save result
result.images[0].save("output.png")

Optimizing Reference Images

The optimise_image_condition function helps improve inference and training performance by preprocessing reference images to optimize token usage. This optimization removes irrelevant pixels while preserving the essential features needed for conditioning.

from src.utils.image_utils import optimise_image_condition
from PIL import Image

# Load your reference image
reference = Image.open("path/to/reference.jpg")

# Optimize the reference image
reference_delta = [0, 0, 96]
optimised_reference, new_reference_delta = optimise_image_condition(reference, reference_delta)

# Use in inference
result = pipe(
    image=input_image,
    reference=optimized_reference,  # Pass the optimized reference
    reference_delta=new_reference_delta,
    prompt="A beautiful landscape with mountains",
    guidance_scale=3.5,
    num_inference_steps=28
)

🛠️ Training

Data Preparation

Your training data should be organized as follows:

data/
├── start/          # Input images (960x512)
├── reference/      # Reference images (512x512)
└── end/           # Target images (896x512)

Training Configuration

# Training config
config = {
    "flux_pipe_id": "black-forest-labs/FLUX.1-Kontext-dev",
    "lora_config": {
        "r": 16,
        "lora_alpha": 32,
        "target_modules": ["to_q", "to_k", "to_v", "to_out.0"],
        "lora_dropout": 0.1,
        "bias": "none",
        "task_type": "CAUSAL_LM"
    },
    "optimizer_config": {
        "type": "AdamW",
        "params": {
            "lr": 1e-4,
            "weight_decay": 0.01,
            "betas": (0.9, 0.999)
        }
    },
    "gradient_checkpointing": True
}

Start Training

# Basic training
python train/script/train.py --config train/config/basic.yaml

# Multi-GPU training
python train/script/train.py --config train/config/multi_gpu.yaml

# Resume training
python train/script/train.py --config train/config/resume.yaml --resume_from_checkpoint path/to/checkpoint.ckpt

Training Monitoring

# Monitor with TensorBoard
tensorboard --logdir runs/

# Monitor with Weights & Biases
wandb login
python train/script/train.py --config train/config/wandb.yaml

📚 Examples

Character Insertion

See examples/character_insert.ipynb for a complete example of inserting characters into scenes.

Trained Model: Check out the omini-kontext-character model on Hugging Face, which is specifically trained to insert cartoon characters into existing scenes.

🏗️ Model Architecture

The Flux Omini Kontext pipeline consists of several key components:

Base model

Flux Kontext dev model

LoRA Integration

# LoRA layers are applied to attention modules
target_modules = ["to_q", "to_k", "to_v", "to_out.0"]

# LoRA configuration
lora_config = {
    "r": 16,                    # Rank
    "lora_alpha": 32,           # Alpha scaling
    "lora_dropout": 0.1,        # Dropout rate
    "bias": "none",             # Bias handling
    "task_type": "CAUSAL_LM"    # Task type
}

Training Process

Input Processing: Encode input and reference images
Text Encoding: Process prompts with CLIP and T5
LoRA Forward: Apply LoRA layers during forward pass
Noise Prediction: Train to predict noise
Loss Computation: MSE loss on noise prediction

⚙️ Configuration

Pipeline Parameters

Parameter	Type	Default	Description
`image`	PIL.Image	None	Input image
`reference`	PIL.Image	None	Reference image
`reference_delta`	List[int]	[0, 0, 0]	Position offset for reference (specific to trained LoRA, recommended: [0, 0, (1024+512)//16])
`prompt`	str	None	Text prompt
`prompt_2`	str	None	Secondary text prompt
`guidance_scale`	float	3.5	Classifier-free guidance scale
`num_inference_steps`	int	28	Number of denoising steps
`height`	int	1024	Output height
`width`	int	1024	Output width

Training Parameters

Parameter	Type	Default	Description
`learning_rate`	float	1e-4	Learning rate
`batch_size`	int	1	Training batch size
`max_epochs`	int	10	Maximum training epochs
`gradient_accumulation_steps`	int	1	Gradient accumulation steps
`warmup_steps`	int	100	Learning rate warmup steps

ComfyUI Integration

Simply clone this repo in your ComfyUI/custom_nodes folder. Using this integration will let you use the native ComfyUI nodes together with the OminiKontext nodes. There are two nodes in the repo -

OminiKontextConditioning - To condition the model on a reference image, along with a delta value.
OminiKontextModelPatch - Patch for the Kontext model.

Drop this image in ComfyUI interface to load the workflow -

Alternative ComfyUI integration -

Repo link - https://github.com/tercumantanumut/ComfyUI-Omini-Kontext

Thanks to tercumantanumut for the ComfyUI integration!

Star History

🤝 Contributing

We welcome contributions! Please see our contributing guidelines:

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

Black Forest Labs for the FLUX.1-Kontext-dev model
HuggingFace for the diffusers library
PyTorch Lightning for the training framework
PEFT for LoRA implementation
OminiControl for the universal control framework for Diffusion Transformers
ComfyUI-Omini-Kontext for the ComfyUI integration

📚 References

@article{omini-kontext,
  title={OminiKontext: Multi-image references for image to image instruction models},
  author={Saquib Alam},
  year={2025}
}

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Full Documentation

Made with ❤️ for the AI community

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github/workflows		.github/workflows
assets		assets
comfyui_nodes		comfyui_nodes
data		data
examples		examples
helpers		helpers
src		src
train		train
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
cog.yaml		cog.yaml
comfyui_nodes.png		comfyui_nodes.png
demo.mp4		demo.mp4
logo.png		logo.png
predict.py		predict.py
requirements.txt		requirements.txt
utils.py		utils.py

Saquib764/omini-kontext

Folders and files

Latest commit

History

Repository files navigation