How can I enable PagedAttention for Llama-3-8B in vLLM? #8883
Blueblack319
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’m running the Llama-3-8B model in vLLM and checked the nsys report. According to the report, neither paged_attention_v1_kernel nor paged_attention_v2_kernel was launched. I verified this by inspecting which attention backend is used in the get_attn_backend() function, where I found that is_blocksparse is always set to false. How can I enable PagedAttention for the Llama-3-8B model?
Beta Was this translation helpful? Give feedback.
All reactions