Questions on Dataset Subset Selection and Filtering Criteria for WebVid-10M and Panda-70M

Hello, thank you for the great work.

I have a couple of questions regarding the dataset preprocessing:

For WebVid-10M, you mentioned filtering out videos with captions that do not contain dynamic content using the LLaMA-3 model. Could you please share the criteria or code used to determine whether a caption contains dynamic content?

For Panda-70M, you stated that 5.3 million videos were downloaded. Could you clarify which subset of videos were selected and how they were chosen?

Thank you in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions on Dataset Subset Selection and Filtering Criteria for WebVid-10M and Panda-70M #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions on Dataset Subset Selection and Filtering Criteria for WebVid-10M and Panda-70M #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions