Attention-Guided Token Selection Algorithm in InternVideo 2.5

Hi OpenGVLab team, thank you very much for all your excellent models.

In the InternVideo 2.5 paper section 3.1, it is mentioned that:

> (1) uniform token pruning in early layers to maintain structural integrity while reducing computational overhead, and (2) attention-guided token selection in deeper layers to retain task-relevant essences.

Regarding the second point, attention-guided token selection, could you please share the specific method you used? Since this process involves attention weight, it may not be compatible with Flash Attention 2. Does this lead to excessive memory consumption?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attention-Guided Token Selection Algorithm in InternVideo 2.5 #302

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Attention-Guided Token Selection Algorithm in InternVideo 2.5 #302

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions