Replies: 2 comments 7 replies
-
|
Beta Was this translation helpful? Give feedback.
7 replies
-
What I think is happening is that this is one of the side effects of the inefficient attention implementation on master. It should be fixed with FlashAttention, if not I will take another look. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I notice that the larger the batch size, the more memory it requires to do consecutive batches.
This confuses me. Shouldn't the earlier batches not impact the future batches? As in, the memory from the earlier batches should be freed, and it shouldn't escalate beyond the first?
I OOM on the third batch when I set the bs to 2048 tokens:
Is this related to / caused by the fact that Flash Attention is not yet supported?
Beta Was this translation helpful? Give feedback.
All reactions