This repository is the official implementation of FlowAlign, an inversion & training free image editing algorithm.
💡 Recent inversion-free, flow-based editors leverage models like Stable Diffusion 3 to enable text-driven image editing via ODE integration.
🤔 However, skipping latent inversion often leads to unstable trajectories and poor source consistency.
🚀 FlowAlign addresses this by introducing a flow-matching loss—a simple yet effective regularizer that ensures smooth, semantically aligned, and structurally consistent edits.
🌟 Thanks to its ODE-based formulation, FlowAlign naturally supports reverse editing, highlighting its reversible and robust transformation capability.
Clone this repo:
git clone https://github.com/FlowAlign/FlowAlign.git
cd FlowAlign
To install requirements:
conda create -n flowalign python==3.11
conda activate flowalign
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
For the text-based image editing, run:
Examples 1
python run_edit.py \
--img_path samples/gas_station.png \
--src_prompt "A gas station with a white and red sign that reads "CAFE" There are several cars parked in front of the gas station, including a white car and a van." \
--tgt_prompt "A gas station with a white and red sign that reads "NeurIPS" There are several cars parked in front of the gas station, including a white car and a van."
The expected result:
Example 2
python run_edit.py \
--img_path samples/raw_meat.jpg \
--src_prompt "a photo of raw meat with herb on it"
--tgt_prompt "a photo of raw salmon with herb on it";
The expected result:
You can freely change the editing method using arguments:
method
: dual / sdedit / flowedit / flowalign
If you use --efficient_memory
, text encoder will pre-compute text embeddings and is removed from the GPU.
This allows us to run image editing with a single GPU with VRAM 24GB.