KV cache usage on CPU #7431
akhilreddy0703
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all, @tmm1, @zhouyuan
Can anyone please help me understand the memory utilization for KV cache by the vLLM server ?
I ran a test to take inference using vLLM server on CPU ( as a docker container) with this --env "VLLM_CPU_KVCACHE_SPACE=40"
I've observed the memory usage by KV cache from the server logs
The below image shows the vllm server logs
What is the meaning of GPU KV cache usage here, though I deployed the container instance on only CPU ??
Beta Was this translation helpful? Give feedback.
All reactions