Replies: 1 comment
-
@LinPoly Thanks for the question! This is not a bug. The watermark is to prevent frequent preemptions (i.e., swapping or recomputation) which can be caused by accepting too many new requests in the batch. For the existing requests in the batch, we want them to use every slot in the KV cache. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In current verison of codes(0.1.1), I noticed that both the can_allocate() method and can_swap_in() method of the BlockSpaceManager class deal with watermarks while can_append_slot() doesn't. It seems that they should have the same mechanism on GPU memory management.
Beta Was this translation helpful? Give feedback.
All reactions