-
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
The data remains in the memory buffer and is computed during the attention, but then it is discarded by the attention mask. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your reply! More details for someone who have the same question (in my understanding) : |
Beta Was this translation helpful? Give feedback.
-
The version of codes I used. |
Beta Was this translation helpful? Give feedback.
The data remains in the memory buffer and is computed during the attention, but then it is discarded by the attention mask.