Specify and Edit: Overcoming Ambiguity in Text‑Based Image Editing

BMVC 2025 · arXiv:2407.20232

This repository contains the official implementation of “Specify and Edit: Overcoming Ambiguity in Text‑Based Image Editing”. Our method, SANE (Specify‑And‑Edit), resolves ambiguous user instructions by first decomposing them into precise sub‑edits and then executing each step with an image‑editing diffusion model.

Installation

1. Clone the repository

git clone https://github.com/fabvio/SANE.git
cd SANE

2. Create the conda environment

conda env create -f requirements.yml   # creates "sane" env by default
conda activate sane

The requirements.yml already pins Python ≥3.10, PyTorch ≥2.2, Diffusers, and all other dependencies tested for this paper.

Dataset format

<dataset_path>/
├── original_images/    # JPEG/PNG input images
│   ├── xxx.jpg
│   └── …
└── instructions/       # plain‑text editing prompts (one per image)
    ├── xxx.txt
    └── …

Naming consistency: For every xxx.jpg (or .png) in original_images/, there must be a matching xxx.txt in instructions/.

Pre‑processing

Before training or evaluation you may want to generate

captions (if your dataset lacks them), and
decomposed instructions for disambiguation.

1. Generate captions (optional)

# Environment variables (only need to export once per session)
export PYTHONPATH=.
export OPENAI_API_KEY=<your_openai_api_key>

python preprocess/caption.py \
  --ds_path <dataset_path>

By default we query the OpenAI Vision model to caption every image into a folder in <dataset_path>.

2. Decompose instructions

python preprocess/decompose.py --ds_path <dataset_path>

This creates a decomposed instruction folder containing specific edits for SANE.

Inference

1. Run SANE

python infer/sane_inference.py --ds_path <dataset_path>

Outputs are written to <output_dir>/results/sane/.

2. Swap in another diffusion model

python infer/sane_inference.py \
  --ds_path <dataset_path> \
  --model_id <model_name>

Any Hugging Face Diffusers model based on InstructPix2Pix will work (e.g. timbrooks/instruct‑pix2pix).

Citation

If you build upon this work, please cite:

@inproceedings{iakovleva2024specify,
  title={Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing},
  author={Iakovleva, Ekaterina and Pizzati, Fabio and Torr, Philip and Lathuili{\`e}re, St{\'e}phane},
  booktitle={The British Machine Vision Conference},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
infer		infer
preprocess		preprocess
README.md		README.md
__init__.py		__init__.py
prompts.py		prompts.py
requirements.yml		requirements.yml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Specify and Edit: Overcoming Ambiguity in Text‑Based Image Editing

Installation

1. Clone the repository

2. Create the conda environment

Dataset format

Pre‑processing

1. Generate captions (optional)

2. Decompose instructions

Inference

1. Run SANE

2. Swap in another diffusion model

Citation

About

Uh oh!

Releases

Packages

Languages

fabvio/SANE

Folders and files

Latest commit

History

Repository files navigation

Specify and Edit: Overcoming Ambiguity in Text‑Based Image Editing

Installation

1. Clone the repository

2. Create the conda environment

Dataset format

Pre‑processing

1. Generate captions (optional)

2. Decompose instructions

Inference

1. Run SANE

2. Swap in another diffusion model

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages