Skip to content

Yui010206/VEGGIE-VidEdit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ICCV 2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Official implementation of VEGGIE, a unified versatile video generative model that handles various tasks for both video concept grounding and editing according to user instructions.

Shoubin Yu*, Difan Liu*, Ziqiao Ma*, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal

Adobe Research, University of Michigan, University of North Carolina at Chapel Hill

Project Website arXiv HuggingFace





Release Item / Timeline

  • Data Generation Pipeline
  • VEGGIE Model Training Code
  • Evaluation Code

Instructional Video Editing Examples

*: Non-Instructional methods utilize paired video captions for editing.

Instruction: Make it on the beach.

Input Video VEGGIE VidToMe* TokenFlow* Flatten* InstructDiff LGVI InsV2V
Input VEGGIE VidToMe TokenFlow Flatten InstructDiff LGVI InsV2V

Instruction: Please add a ball in the given video frames.

Input Video VEGGIE VidToMe* TokenFlow* Flatten* InstructDiff LGVI InsV2V
Input VEGGIE VidToMe TokenFlow Flatten InstructDiff LGVI InsV2V

Instruction: Make it Chinese ink style.

Input Video VEGGIE VidToMe* TokenFlow* Flatten* InstructDiff LGVI InsV2V
Input VEGGIE VidToMe TokenFlow Flatten InstructDiff LGVI InsV2V

Instruction: Could you label the bear in these video frames with red color masks?

Input Video VEGGIE VidToMe* TokenFlow* Flatten* InstructDiff LGVI InsV2V
Input VEGGIE VidToMe TokenFlow Flatten InstructDiff LGVI InsV2V

Instruction: Replace the cup with a bottle of flower.

Input Video VEGGIE VidToMe* TokenFlow* Flatten* InstructDiff LGVI InsV2V
Input VEGGIE VidToMe TokenFlow Flatten InstructDiff LGVI InsV2V

Instruction: Please remove the man in black in given video frames.

Input Video VEGGIE VidToMe* TokenFlow* Flatten* InstructDiff LGVI InsV2V
Input VEGGIE VidToMe TokenFlow Flatten InstructDiff LGVI InsV2V

Instruction: Make the swan white.

Input Video VEGGIE VidToMe* TokenFlow* Flatten* InstructDiff LGVI InsV2V
Input VEGGIE VidToMe TokenFlow Flatten InstructDiff LGVI InsV2V

Instruction: What can be used for heating food? Highlight your answer with red masks.

Input Video VEGGIE VidToMe* TokenFlow* Flatten* InstructDiff LGVI InsV2V
Input VEGGIE VidToMe TokenFlow Flatten InstructDiff LGVI InsV2V

Instructional Video Editing Examples

Reference

Please cite our paper if you use our models in your works:

@article{yu2025veggie,
        title={VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation}, 
        author={Shoubin Yu and Difan Liu and Ziqiao Ma and Yicong Hong and Yang Zhou and Hao Tan and Joyce Chai and Mohit Bansal},
        year={2025},
        journal={arXiv:2503.14350},
}

About

[ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published