Skip to content

Kunbyte-AI/ROSE

Repository files navigation

ROSE: Remove Objects with Side Effects in Videos

1Zhejiang University  2KunByte AI  3Peking University  4The University of Hong Kong 
Under Review

⭐ If ROSE is helpful to your projects, please help star this repo. Thanks! 🤗

📖 For more visual results, go checkout our project page


TODO

  • Release checkpoints.
  • Release inference code.
  • Release gradio demo.

Results

Shadow

Masked Input Output

Reflection

Masked Input Output

Common

Masked Input Output

Light Source

Masked Input Output

Translucent

Masked Input Output

Mirror

Masked Input Output

Overview

overall_structure

Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/Kunbyte-AI/ROSE.git
  2. Create Conda Environment and Install Dependencies

    # create new anaconda env
    conda create -n rose python=3.12 -y
    conda activate rose
    
    # install python dependencies
    pip3 install -r requirements.txt
    • CUDA = 12.4
    • PyTorch = 2.6.0
    • Torchvision = 0.21.0
    • Other required packages in requirements.txt

Get Started

Prepare pretrained models

We use pretrained Wan2.1-Fun-1.3B-InP as our base model. And during training, we only train the WanTransformer3D part and keep other parts frozen. You can download the weight of Transformer3D of ROSE from this link.

For local inference, the weights directory should be arranged like this:

weights
 ├── transformer
   ├── config.json
   ├── diffusion_pytorch_model.safetensors

Also, it's necessary to prepare the base model in the models directory. You can download the Wan2.1-Fun-1.3B-InP base model from this link.

The models will be arranged like this:

models
 ├── Wan2.1-Fun-1.3B-InP
   ├── google
     ├── umt5-xxl
       ├── spiece.model
       ├── special_tokens_map.json
           ...
   ├── xlm-roberta-large
     ├── sentencepiece.bpe.model
     ├── tokenizer_config.json
         ...
 ├── config.json
 ├── configuration.json
 ├── diffusion_pytorch_model.safetensors
 ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
 ├── models_t5_umt5-xxl-enc-bf16.pth
 ├── Wan2.1_VAE.pth

And for gradio demo, we use the pretrained SAM for generating masks.

For more information about using gradio demo, please check out the README under hugging_face folder.

The complete weight directory structure for gradio demo will be arranged as:

weights
 ├── transformer
   ├── config.json
   ├── diffusion_pytorch_model.safetensors
 ├── cutie-base-mega.pth
 ├── sam_vit_h_4b8939.pth
 ├── download_sam_ckpt.sh

🏂 Quick test

We provide some examples in the data/eval folder. Run the following commands to try it out:

python inference.py 
Usage:

python inference.py [options]

Options:
  --validation_videos  Path(s) to input videos 
  --validation_masks   Path(s) to mask videos 
  --validation_prompts Text prompts (default: [""])
  --output_dir         Output directory 
  --video_length       Number of frames per video (It needs to be 16n+1.)
  --sample_size        Frame size: height width (default: 480 720)

💃🏻 Interactive Demo

We also provide an interactive demo for object removal, allowing users to select any object they wish to remove from a video. You can try the demo on Hugging Face or run it locally.

Citation

If you find our repo useful for your research, please consider citing our paper:

@article{miao2025rose,
   title={ROSE: Remove Objects with Side Effects in Videos}, 
   author={Miao, Chenxuan and Feng, Yutong and Zeng, Jianshu and Gao, Zixiang and Liu, Hantang and Yan, Yunfeng and Qi, Donglian and Chen, Xi and Wang, Bin and Zhao, Hengshuang},
   journal={arXiv preprint arXiv:2508.18633},
   year={2025}
}

Contact

If you have any questions, please feel free to reach me out at weiyuchoumou526@gmail.com.

Acknowledgement

This code is based on Wan2.1-Fun-1.3B-Inpaint and some code are brought from ProPainter. Thanks for their awesome works!

About

Official Repository of "ROSE: Remove Objects with Side Effects in Videos"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published