Skip to content

Questions About InternVideo2clip Training Data and Fine-Tuning Requirements #293

@JayChen7777

Description

@JayChen7777

Thank you for your work! I have a question:

In the paper, it is stated: "We also learn a CLIP-style InternVideo2 indicated by InternVideo2clip. It is post-pretrained from InternVideo2s2 by only preserving video and text encoders and contrastive loss."

May I ask what training dataset was used for InternVideo2clip? Does it include any Chinese data? Approximately how much data would be required to fine-tune it effectively?

I noticed that only the attnpool of the vision encoder has been released in the official weights. https://huggingface.co/OpenGVLab/InternVideo2-Stage2_1B-224p-f4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions