Skip to content

Saquib764/omini-kontext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Flux Omini Kontext framework for multi-image reference: training and inference

Python PyTorch Lightning License

Updates

  • Qwen-Image-Edit support added. Spatial character insertion model available on HF(link below). this is experimental model, and not yet trained on large dataset. It sometimes insert duplicate characters.
  • This repository includes nodes and patches for ComfyUI that are compatible with the Nunchaku extension. So you can use nunchaku version of Flux Kontext model with Omini Kontext at lightning speed.

πŸš€ Live Demo

demo.mp4

Replicate version: https://replicate.com/thefluxtrain/omini-kontext

Installation guides, workflows, turorials, and demos

If you have trained your own model, you can use it on replicate. You will have to upload the model on HF, and enter the details on replicate.

OminiKontext is a framework built around Flux.1-Kontext-dev model. We do not alter the model architecture, but rather we play around with 3D RoPE embeddings to enable reference based edits on a given image.

The approach is heavily inspired from OminiControl project, that uses the same RoPE embeddings trick to achieve reference based image generation using Flux.1 dev model. However, Flux.1 dev uses 2D RoPE embeddings, where as Kontext uses 3D RoPE embeddings.

More details on delta values-

Read this issue - #12

🎨 Generated Samples

Spatial Character Insertion

The following examples demonstrate how the trained model can insert cartoon characters into existing scenes. It takes a reference character with the desired position of character in an white image, and a scene image. The model will insert the character into the scene image at the desired position. The model takes a some freedom to place the character, around the desired position (not exactly at the desired position), based on the feasibility (common sense) of image.

I used 30 image pair to train the model for intuitive blending task. Sometimes the results are not good, but overall the model is able to blend the character into the scene. This more like a POC. I plan to train another model(obviously opensource) with much larger dataset to make it more robust.

Scene Reference Character Generated Result
Scene 1 Boy Reference Output 1
Scene 2 Boy Reference Output 2
Scene 3 Boy Reference Output 3
πŸ“‹ Click to expand/collapse more spatial character insertion examples
Scene Reference Character Generated Result
Scene 4 Boy Reference Output 4
Scene 5 Boy Reference Output 5
Scene 6 Boy Reference Output 6
Scene 7 Boy Reference Output 7
Scene 8 Boy Reference Output 8
Scene 9 Boy Reference Output 9
Scene 10 Boy Reference Output 10
Scene 11 Boy Reference Output 11
Scene 12 Boy Reference Output 12
Scene 13 Boy Reference Output 13
Scene 14 Boy Reference Output 14

Non-spatial Character Insertion

The following examples demonstrate how the trained model can insert cartoon characters into existing scenes. There is not spatial control over the character in this case.

Scene Reference Character Generated Result
Scene 1 Boy Reference Output 1
Scene 2 Boy Reference Output 2

More comming soon!

Model Comparison

The following table compares the performance of Omini Kontext model with a character insertion LoRA, against the vanilla FLUX.1-Kontext-dev model. For the comparision, we used Add character to the image. The character is scared. as the prompt for all the images.

Scene Reference Vanilla Omini
Living Room Boy Living Room Boy Vanilla Living Room Boy Omini
Living Room Dog Living Room Dog Vanilla Living Room Dog Omini
πŸ“‹ Click to expand/collapse more model comparison examples
Scene Reference Vanilla Omini
Living Room Girl Living Room Girl Vanilla Living Room Girl Omini
Forest Boy Forest Boy Vanilla Forest Boy Omini
Forest Girl Forest Girl Vanilla Forest Girl Omini
Forest Dog Forest Dog Vanilla Forest Dog Omini

Pretrained LoRA models

Model Name Delta Description
character_3000.safetensors [0,0,96] On comfyUI, use cfg=1.5 , LoRA strength = 0.5-0.7. Character in white background.
spatial-character-test.safetensors [1,0,0] On comfyUI, use cfg=1.5 , LoRA strength = 0.5-0.7 . Upload the reference image of same size as the base image, with character placed corresponding to where you want it to be.
product_2000.safetensors [0,0,96] On comfyUI, use cfg=1.5 , LoRA strength = 0.5-0.7. Product in white background.

Notes:

  1. Sometimes, the product or characters are too big compared to the rest of the scene, simply use a smaller resolution image.
  2. Trained on 512x512 reference image, but works fine with 1024x1024

πŸ“‹ To-do

  • Add Qwen-Image-Edit support
  • Extend to input multiple references.
  • Create more demos for various usecases. Community support needed!
  • Add Nunchaku integration in ComfyUI
  • Use dataset from HF for training
  • Scrip to push dataset to huggingface
  • Create an easy to use ComfyUI integration, one that uses native comfyui nodes. Scroll to end.
  • Make a data processing script, available in helpers/dataset_creator.ipynb
  • Add ways to control location and scale of the reference character
  • Speed up by removing irrelevant pixels
  • Deploy a public demo
  • Deploy a replicate version
  • Add comfyUI integration - Scroll to bottom
  • Basic training script
  • Basic inference script

Model training Plans

  • Qwen-Image-Edit character model: Train character models for Qwen-Image-Edit.
  • Person Models: Develop models for realistic human subjects
  • Clothes Models: Create models for clothing and fashion items
  • Subject Models: Train models for specific objects and items
  • Character Models: Train specialized models for anime/cartoon characters

πŸš€ Quick Start

Setup Environment

# Create conda environment
conda create -n omini-kontext python=3.10
conda activate omini-kontext

# Install dependencies
pip install -r requirements.txt

Basic Training

πŸ“¦ Installation

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended: 24GB+ VRAM)
  • PyTorch 2.0+
  • HuggingFace account for model access

Install Dependencies

# Core requirements
pip install torch>=2.0.0 lightning>=2.0.0

# Install diffusers from GitHub (required for FluxKontext pipeline)
pip install git+https://github.com/huggingface/diffusers

# Training-specific requirements
pip install -r requirements.txt

Verify Installation

import torch
from src.pipeline_flux_omini_kontext import FluxOminiKontextPipeline

# Test pipeline loading
pipe = FluxOminiKontextPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev"
)
print("βœ… Installation successful!")

🎯 Usage

Basic Inference

from diffusers.utils import load_image
from src.pipeline_flux_omini_kontext import FluxOminiKontextPipeline
import torch

# Load pipeline
pipe = FluxOminiKontextPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

# Load images
input_image = load_image("path/to/input.jpg")
reference_image = load_image("path/to/reference.jpg")

# Load Character OminiKontext LoRA
pipe.load_lora_weights(
    "saquiboye/omini-kontext-character",
    weight_name="character_3000.safetensors",
    adapter_name="lora_weights"
)

# Generate
result = pipe(
    image=input_image,
    reference=reference_image,
    reference_delta=[0, 0, 96],  # Position delta for reference
    prompt="A beautiful landscape with mountains",
    guidance_scale=3.5,
    num_inference_steps=28
)

# Save result
result.images[0].save("output.png")

Optimizing Reference Images

The optimise_image_condition function helps improve inference and training performance by preprocessing reference images to optimize token usage. This optimization removes irrelevant pixels while preserving the essential features needed for conditioning.

from src.utils.image_utils import optimise_image_condition
from PIL import Image

# Load your reference image
reference = Image.open("path/to/reference.jpg")

# Optimize the reference image
reference_delta = [0, 0, 96]
optimised_reference, new_reference_delta = optimise_image_condition(reference, reference_delta)

# Use in inference
result = pipe(
    image=input_image,
    reference=optimized_reference,  # Pass the optimized reference
    reference_delta=new_reference_delta,
    prompt="A beautiful landscape with mountains",
    guidance_scale=3.5,
    num_inference_steps=28
)

πŸ› οΈ Training

Data Preparation

Your training data should be organized as follows:

data/
β”œβ”€β”€ start/          # Input images (960x512)
β”œβ”€β”€ reference/      # Reference images (512x512)
└── end/           # Target images (896x512)

Training Configuration

# Training config
config = {
    "flux_pipe_id": "black-forest-labs/FLUX.1-Kontext-dev",
    "lora_config": {
        "r": 16,
        "lora_alpha": 32,
        "target_modules": ["to_q", "to_k", "to_v", "to_out.0"],
        "lora_dropout": 0.1,
        "bias": "none",
        "task_type": "CAUSAL_LM"
    },
    "optimizer_config": {
        "type": "AdamW",
        "params": {
            "lr": 1e-4,
            "weight_decay": 0.01,
            "betas": (0.9, 0.999)
        }
    },
    "gradient_checkpointing": True
}

Start Training

# Basic training
python train/script/train.py --config train/config/basic.yaml

# Multi-GPU training
python train/script/train.py --config train/config/multi_gpu.yaml

# Resume training
python train/script/train.py --config train/config/resume.yaml --resume_from_checkpoint path/to/checkpoint.ckpt

Training Monitoring

# Monitor with TensorBoard
tensorboard --logdir runs/

# Monitor with Weights & Biases
wandb login
python train/script/train.py --config train/config/wandb.yaml

πŸ“š Examples

Character Insertion

See examples/character_insert.ipynb for a complete example of inserting characters into scenes.

Trained Model: Check out the omini-kontext-character model on Hugging Face, which is specifically trained to insert cartoon characters into existing scenes.

πŸ—οΈ Model Architecture

The Flux Omini Kontext pipeline consists of several key components:

Base model

Flux Kontext dev model

LoRA Integration

# LoRA layers are applied to attention modules
target_modules = ["to_q", "to_k", "to_v", "to_out.0"]

# LoRA configuration
lora_config = {
    "r": 16,                    # Rank
    "lora_alpha": 32,           # Alpha scaling
    "lora_dropout": 0.1,        # Dropout rate
    "bias": "none",             # Bias handling
    "task_type": "CAUSAL_LM"    # Task type
}

Training Process

  1. Input Processing: Encode input and reference images
  2. Text Encoding: Process prompts with CLIP and T5
  3. LoRA Forward: Apply LoRA layers during forward pass
  4. Noise Prediction: Train to predict noise
  5. Loss Computation: MSE loss on noise prediction

βš™οΈ Configuration

Pipeline Parameters

Parameter Type Default Description
image PIL.Image None Input image
reference PIL.Image None Reference image
reference_delta List[int] [0, 0, 0] Position offset for reference (specific to trained LoRA, recommended: [0, 0, (1024+512)//16])
prompt str None Text prompt
prompt_2 str None Secondary text prompt
guidance_scale float 3.5 Classifier-free guidance scale
num_inference_steps int 28 Number of denoising steps
height int 1024 Output height
width int 1024 Output width

Training Parameters

Parameter Type Default Description
learning_rate float 1e-4 Learning rate
batch_size int 1 Training batch size
max_epochs int 10 Maximum training epochs
gradient_accumulation_steps int 1 Gradient accumulation steps
warmup_steps int 100 Learning rate warmup steps

ComfyUI Integration

Simply clone this repo in your ComfyUI/custom_nodes folder. Using this integration will let you use the native ComfyUI nodes together with the OminiKontext nodes. There are two nodes in the repo -

  1. OminiKontextConditioning - To condition the model on a reference image, along with a delta value.
  2. OminiKontextModelPatch - Patch for the Kontext model.

Drop this image in ComfyUI interface to load the workflow -

ComfyUI Workflow

Components


Alternative ComfyUI integration -

Repo link - https://github.com/tercumantanumut/ComfyUI-Omini-Kontext

Thanks to tercumantanumut for the ComfyUI integration!

Star History

Star History Chart

🀝 Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Black Forest Labs for the FLUX.1-Kontext-dev model
  • HuggingFace for the diffusers library
  • PyTorch Lightning for the training framework
  • PEFT for LoRA implementation
  • OminiControl for the universal control framework for Diffusion Transformers
  • ComfyUI-Omini-Kontext for the ComfyUI integration

πŸ“š References

@article{omini-kontext,
  title={OminiKontext: Multi-image references for image to image instruction models},
  author={Saquib Alam},
  year={2025}
}

πŸ“ž Support


Made with ❀️ for the AI community

About

An inference and training framework for multiple image input in Flux Kontext dev

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published