How to properly implement attention mask?

**Question**
Here again 😄! Seems hard for me to think how to apply attention mask to fast attention, can you please shed some light on that?

I think I should fill some of the `Q'` and `K'` to 0 according to the attention_mask, since `Q' @ K'.T` equals the matrix `A`, but is that correct?