We propose Frame In-N-Out, a controllable image-to-video generation framework where objects can enter or exit the scene along user-defined motion trajectories. Our method introduces a new dataset curation pattern recognition, evaluation protocol, and a motion-controllable, identity-preserving video Diffusion Transformer, to achieve Frame In and Frame Out in the cinematic domain.
- Release the paper
- Release the model weights (CogVideoX)
- Gradio demo (with online)
- Release the Training Code
- Release the Processed Training Dataset