Skip to content

How to properly implement attention mask? #2

@codars

Description

@codars

Question
Here again 😄! Seems hard for me to think how to apply attention mask to fast attention, can you please shed some light on that?

I think I should fill some of the Q' and K' to 0 according to the attention_mask, since Q' @ K'.T equals the matrix A, but is that correct?

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions