Yunfeng Yan1 Donglian Qi1 Xi Chen4 Bin Wang2 Hengshuang Zhao4
⭐ If ROSE is helpful to your projects, please help star this repo. Thanks! 🤗
📖 For more visual results, go checkout our project page
- Release checkpoints.
- Release inference code.
- Release gradio demo.
Masked Input | Output |
---|---|
![]() |
![]() |
![]() |
![]() |
Masked Input | Output |
---|---|
![]() |
![]() |
![]() |
![]() |
Masked Input | Output |
---|---|
![]() |
![]() |
![]() |
![]() |
Masked Input | Output |
---|---|
![]() |
![]() |
![]() |
![]() |
Masked Input | Output |
---|---|
![]() |
![]() |
![]() |
![]() |
Masked Input | Output |
---|---|
![]() |
![]() |
![]() |
![]() |
-
Clone Repo
git clone https://github.com/Kunbyte-AI/ROSE.git
-
Create Conda Environment and Install Dependencies
# create new anaconda env conda create -n rose python=3.12 -y conda activate rose # install python dependencies pip3 install -r requirements.txt
- CUDA = 12.4
- PyTorch = 2.6.0
- Torchvision = 0.21.0
- Other required packages in
requirements.txt
We use pretrained Wan2.1-Fun-1.3B-InP
as our base model. And during training, we only train the WanTransformer3D part and keep other parts frozen. You can download the weight of Transformer3D of ROSE from this link
.
For local inference, the weights
directory should be arranged like this:
weights
├── transformer
├── config.json
├── diffusion_pytorch_model.safetensors
Also, it's necessary to prepare the base model in the models directory. You can download the Wan2.1-Fun-1.3B-InP base model from this link
.
The models
will be arranged like this:
models
├── Wan2.1-Fun-1.3B-InP
├── google
├── umt5-xxl
├── spiece.model
├── special_tokens_map.json
...
├── xlm-roberta-large
├── sentencepiece.bpe.model
├── tokenizer_config.json
...
├── config.json
├── configuration.json
├── diffusion_pytorch_model.safetensors
├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
├── models_t5_umt5-xxl-enc-bf16.pth
├── Wan2.1_VAE.pth
And for gradio demo, we use the pretrained SAM
for generating masks.
For more information about using gradio demo, please check out the README under hugging_face
folder.
The complete weight directory structure for gradio demo will be arranged as:
weights
├── transformer
├── config.json
├── diffusion_pytorch_model.safetensors
├── cutie-base-mega.pth
├── sam_vit_h_4b8939.pth
├── download_sam_ckpt.sh
We provide some examples in the data/eval
folder.
Run the following commands to try it out:
python inference.py
Usage:
python inference.py [options]
Options:
--validation_videos Path(s) to input videos
--validation_masks Path(s) to mask videos
--validation_prompts Text prompts (default: [""])
--output_dir Output directory
--video_length Number of frames per video (It needs to be 16n+1.)
--sample_size Frame size: height width (default: 480 720)
We also provide an interactive demo for object removal, allowing users to select any object they wish to remove from a video. You can try the demo on Hugging Face or run it locally.
If you find our repo useful for your research, please consider citing our paper:
@article{miao2025rose,
title={ROSE: Remove Objects with Side Effects in Videos},
author={Miao, Chenxuan and Feng, Yutong and Zeng, Jianshu and Gao, Zixiang and Liu, Hantang and Yan, Yunfeng and Qi, Donglian and Chen, Xi and Wang, Bin and Zhao, Hengshuang},
journal={arXiv preprint arXiv:2508.18633},
year={2025}
}
If you have any questions, please feel free to reach me out at weiyuchoumou526@gmail.com.
This code is based on Wan2.1-Fun-1.3B-Inpaint and some code are brought from ProPainter. Thanks for their awesome works!