**Question** Here again 😄! Seems hard for me to think how to apply attention mask to fast attention, can you please shed some light on that? I think I should fill some of the `Q'` and `K'` to 0 according to the attention_mask, since `Q' @ K'.T` equals the matrix `A`, but is that correct?