[Feature Request] Testing with defined prefix lengths #104

thameem-abbas · 2025-04-04T16:38:27Z

With the v1 engine of vLLM enabling prefix caching by default, we would need a way to consistently test this from the client side. vLLM built-in benchmarks already have support for this so that can serve as a sample.

The throughput improvements as we know are considerable with a good hit rate.

Linking to vLLM v1 blog: https://blog.vllm.ai/2025/01/27/v1-alpha-release.html#:~:text=3.%20Zero%2DOverhead%20Prefix%20Caching

rgreenberg1 added this to GuideLLM Kanban Board May 8, 2025

rgreenberg1 added the priority-high label May 8, 2025

rgreenberg1 moved this to Ready in GuideLLM Kanban Board May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Testing with defined prefix lengths #104

[Feature Request] Testing with defined prefix lengths #104

thameem-abbas commented Apr 4, 2025

[Feature Request] Testing with defined prefix lengths #104

[Feature Request] Testing with defined prefix lengths #104

Comments

thameem-abbas commented Apr 4, 2025