Skip to content

damo-cv/JCo-MVTON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on

Paper Homepage Checkpoints Demo License

Aowen Wang¹, Wei Li¹, Hao Luo¹ ², Mengxing Ao¹, Fan Wang¹

¹DAMO Academy, Alibaba Group ²Hupan Lab

Overview

JCo-MVTON introduces a novel framework for mask-free virtual try-on based on MM-DiT that addresses key limitations of existing systems: rigid dependencies on human body masks, limited fine-grained control over garment attributes, and poor generalization to in-the-wild scenarios.

Overview

Quick Start

Clone the repository

git clone https://github.com/damo-cv/JCo-MVTON.git
cd JCo-MVTON

Create conda environment

conda create -n jco-mvton python=3.10
conda activate jco-mvton

Install dependencies

pip install -r requirements.txt
git clone https://github.com/huggingface/diffusers.git
cd diffusers
git checkout v0.33.0
cp flux/modeling_utils.py   diffusers/src/diffusers/models
pip install .

Download Pre-trained Models

# Download the upper model checkpoint
wget https://huggingface.co/Damo-vision/JCo-MVTON/resolve/main/try_on_upper.pt

# Download the lower model checkpoint
wget https://huggingface.co/Damo-vision/JCo-MVTON/resolve/main/try_on_lower.pt

# Download the dress model checkpoint
wget https://huggingface.co/Damo-vision/JCo-MVTON/resolve/main/try_on_dress.pt

Basic Usage

# Load transformer
model_id = "black-forest-labs/FLUX.1-dev"
ckpt = 'ckpts/try_on_upper.pt'
transformer = FluxTransformer2DModel.from_pretrained(
model_id,
torch_dtype=torch_dtype,
subfolder="transformer",
extra_branch_num=extra_branch_num,
low_cpu_mem_usage=False,
).to(device)
transformer.load_state_dict(torch.load(ckpt)['module'], strict=False)
pipe = FluxPipeline.from_pretrained(
    model_id, 
    torch_dtype=torch_dtype,
    transformer=transformer,
).to(device)
# Load and preprocess images

person = Image.open('assets/ref.jpg').convert("RGB").resize((width, height))
cloth = Image.open('assets/upper.jpg').convert("RGB").resize((height, height))

person_tensor = transform_person(person)
cloth_tensor = transform_cloth(cloth)

prompt = "A fashion model wearing stylish clothing, high-resolution 8k, detailed textures, realistic lighting, fashion photography style."

# Generate image

with torch.inference_mode():
generated_image = pipe(
generator=torch.Generator(device="cpu").manual_seed(seed),
prompt=prompt,
num_inference_steps=n_steps,
guidance_scale=guidance_scale,
height=height,
width=width,
cloth_img=cloth_tensor,
person_img=person_tensor,
extra_branch_num=extra_branch_num,
mode=mode,
max_sequence_length=77,
).images[0]

# Save result

person_tensor = transform_output(person)
cloth_tensor = transform_output(cloth)
generated_tensor = transform_output(generated_image)

concatenated_tensor = torch.cat((cloth_tensor, person_tensor, generated_tensor), dim=2)
vutils.save_image(concatenated_tensor, 'output.png')

Results

Overview

JCo-MVTON achieves state-of-the-art performance across multiple metrics:

Methods Paired Paired Paired Paired Unpaired Unpaired
SSIM ↑ FID ↓ KID ↓ LPIPS ↓ FID ↓ KID ↓
MV-VTON (Wang et al., 2025b) 0.8083 15.442 7.501 0.1171 17.900 3.861
OOTDiffusion (Xu et al., 2024) 0.8187 9.305 4.086 0.0876 12.408 4.689
JCo-MVTON (Ours) 0.8601 8.103 2.003 0.0891 9.561 2.700

Citation

If you find our work useful, please cite:

@article{wang2024jco,
  title={JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on},
  author={Wang, Aowen and Li, Wei and Luo, Hao and Ao, Mengxing and Wang, Fan},
  journal={arXiv preprint arXiv:xxxx.xxxxx},
  year={2024}
}

License

This project is released under the Apache 2.0 license.

Acknowledgments

We thank the open-source community for their valuable contributions and the reviewers for their constructive feedback. Special thanks to the DAMO Academy and Hupan Lab for supporting this research.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages