Official implementation of VEGGIE, a unified versatile video generative model that handles various tasks for both video concept grounding and editing according to user instructions.
Shoubin Yu*, Difan Liu*, Ziqiao Ma*, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal
Adobe Research, University of Michigan, University of North Carolina at Chapel Hill
- Data Generation Pipeline
- VEGGIE Model Training Code
- Evaluation Code
*: Non-Instructional methods utilize paired video captions for editing.
Input Video | VEGGIE | VidToMe* | TokenFlow* | Flatten* | InstructDiff | LGVI | InsV2V |
---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Input Video | VEGGIE | VidToMe* | TokenFlow* | Flatten* | InstructDiff | LGVI | InsV2V |
---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Input Video | VEGGIE | VidToMe* | TokenFlow* | Flatten* | InstructDiff | LGVI | InsV2V |
---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Input Video | VEGGIE | VidToMe* | TokenFlow* | Flatten* | InstructDiff | LGVI | InsV2V |
---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Input Video | VEGGIE | VidToMe* | TokenFlow* | Flatten* | InstructDiff | LGVI | InsV2V |
---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Input Video | VEGGIE | VidToMe* | TokenFlow* | Flatten* | InstructDiff | LGVI | InsV2V |
---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Input Video | VEGGIE | VidToMe* | TokenFlow* | Flatten* | InstructDiff | LGVI | InsV2V |
---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Input Video | VEGGIE | VidToMe* | TokenFlow* | Flatten* | InstructDiff | LGVI | InsV2V |
---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Please cite our paper if you use our models in your works:
@article{yu2025veggie,
title={VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation},
author={Shoubin Yu and Difan Liu and Ziqiao Ma and Yicong Hong and Yang Zhou and Hao Tan and Joyce Chai and Mohit Bansal},
year={2025},
journal={arXiv:2503.14350},
}