[Sliding Window Attention] BlockManagerV2 allocate with SWA #7468
Closed
yangsijia-serena
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there, I'm new to vllm and I may have missed something, but in BlockManagerV2, I only see consideration of the sliding window in the can_allocate function, like the following code snippet:
But I don't see any consideration of the sliding window when actually performing the allocation. Is this by design or a potential bug? If it's by design, I'm wondering about a scenario where the entire prompt requires 4 blocks, but the number of free blocks is only 3. In this case, if max_block_sliding_window=3, the can_allocate function would return True, but when it comes to the actual allocation, there wouldn't be enough space for the tokens in the 4th block. Is this a known issue or something that is handled differently?
Beta Was this translation helpful? Give feedback.
All reactions