-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Thanks for the good work. In terms of encoding temporal information in video frames, ARC-Hunyuan-Video-7B overlays timestamps onto the frames. What is the motivation for this method? Compared to other methods, such as position encoding (e.g., Qwen-VL and Keye-VL) or prepending timestamp tokens before video frame tokens (e.g., Seed1.5-VL), what are the advantanges of this design? Is there any ablation study to validate its advantages?
Metadata
Metadata
Assignees
Labels
No labels