[Core] Avoid KVCacheBlock.eq invocations in FreeKVCacheBlockQueue #21005

JialinOuyang-Meta · 2025-07-15T17:37:58Z

Summary:

Optimizations

As a common trick for doubly linked list implementation, introducing fake head and tail nodes would significantly reduce the implementation overhead, and help us to get rid of dataclass.eq comparison easily.

No dataclass.eq invocation
Shorter code
Branchless

All these combined should yield significant perf improvement for this piece of code.

Observations

Per vLLM profiling, kv_cache_manager.allocate_slots consumed non-negligible cost for each prefill.

|{F1980260529}|{F1980260481}|{F1980260497}|

By zooming in, we could see the stack of FreeKVCacheBlockQueue.popleft is non-trivial. popleft -> remove -> string.eq which is mainly coming from dataclasses (i.e. KVCacheBlock) equal comparison.

Per dataclasses python library doc

dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)

eq: If true (the default), an __eq__() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type.

If the class already defines __eq__(), this parameter is ignored.

Test Plan:

Result

Typically, block_size is set to 16, so in production usage, we might likely allocate 10-1000 blocks. In this range, the optimization gave us up to ~1ms TTFT savings (the improvements are more significant on long inputs).

Benchmark

After

Before
|

Stack

After

Before

Rollback Plan:

Reviewed By: CuiCoco

Differential Revision: D78292345

github-actions · 2025-07-15T17:38:06Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

facebook-github-bot · 2025-07-15T17:38:13Z

This pull request was exported from Phabricator. Differential Revision: D78292345

gemini-code-assist

Code Review

This pull request introduces a significant performance optimization to the FreeKVCacheBlockQueue by implementing a doubly linked list with sentinel nodes. This change effectively removes expensive __eq__ comparisons on KVCacheBlock dataclasses, which should improve performance as demonstrated by the new benchmark. The implementation is a classic and well-executed approach.

My review focuses on ensuring the robustness of this new implementation. I've identified a couple of areas where adding validation checks could prevent potential crashes from state inconsistencies, making the system more resilient. These changes should have a negligible performance impact while significantly improving debuggability and correctness guarantees.

vllm/v1/core/kv_cache_utils.py

…project#21005) Summary: # Optimizations As a common trick for doubly linked list implementation, introducing fake head and tail nodes would significantly reduce the implementation overhead, and help us to get rid of dataclass.__eq__ comparison easily. - No dataclass.__eq__ invocation - Shorter code - Branchless All these combined should yield significant perf improvement for this piece of code. # Observations Per vLLM profiling, kv_cache_manager.allocate_slots consumed non-negligible cost for each prefill. |{F1980260529}|{F1980260481}|{F1980260497}| By zooming in, we could see the stack of FreeKVCacheBlockQueue.popleft is non-trivial. popleft -> remove -> string.__eq__ which is mainly coming from dataclasses (i.e. KVCacheBlock) equal comparison. Per [dataclasses python library doc](https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass) ``` dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False) eq: If true (the default), an __eq__() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If the class already defines __eq__(), this parameter is ignored. ``` Test Plan: # Result Typically, block_size is set to 16, so in production usage, we might likely allocate 10-1000 blocks. In this range, the optimization gave us up to ~1ms TTFT savings (the improvements are more significant on long inputs). |After|Before| |{F1980286936}|{F1980286941}| Rollback Plan: Reviewed By: CuiCoco Differential Revision: D78292345

facebook-github-bot · 2025-07-15T17:49:48Z

This pull request was exported from Phabricator. Differential Revision: D78292345

facebook-github-bot · 2025-07-15T17:53:57Z

This pull request was exported from Phabricator. Differential Revision: D78292345

…project#21005) Summary: Pull Request resolved: vllm-project#21005 # Optimizations As a common trick for doubly linked list implementation, introducing fake head and tail nodes would significantly reduce the implementation overhead, and help us to get rid of dataclass.__eq__ comparison easily. - No dataclass.__eq__ invocation - Shorter code - Branchless All these combined should yield significant perf improvement for this piece of code. # Observations Per vLLM profiling, kv_cache_manager.allocate_slots consumed non-negligible cost for each prefill. |{F1980260529}|{F1980260481}|{F1980260497}| By zooming in, we could see the stack of FreeKVCacheBlockQueue.popleft is non-trivial. popleft -> remove -> string.__eq__ which is mainly coming from dataclasses (i.e. KVCacheBlock) equal comparison. Per [dataclasses python library doc](https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass) ``` dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False) eq: If true (the default), an __eq__() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If the class already defines __eq__(), this parameter is ignored. ``` Test Plan: # Result Typically, block_size is set to 16, so in production usage, we might likely allocate 10-1000 blocks. In this range, the optimization gave us up to ~1ms TTFT savings (the improvements are more significant on long inputs). |After|Before| |{F1980286936}|{F1980286941}| Rollback Plan: Reviewed By: CuiCoco Differential Revision: D78292345

…project#21005) Summary: Pull Request resolved: vllm-project#21005 # Optimizations As a common trick for doubly linked list implementation, introducing fake head and tail nodes would significantly reduce the implementation overhead, and help us to get rid of dataclass.__eq__ comparison easily. - No dataclass.__eq__ invocation - Shorter code - Branchless All these combined should yield significant perf improvement for this piece of code. # Observations Per vLLM profiling, kv_cache_manager.allocate_slots consumed non-negligible cost for each prefill. |{F1980260529}|{F1980260481}|{F1980260497}| By zooming in, we could see the stack of FreeKVCacheBlockQueue.popleft is non-trivial. popleft -> remove -> string.__eq__ which is mainly coming from dataclasses (i.e. KVCacheBlock) equal comparison. Per [dataclasses python library doc](https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass) ``` dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False) eq: If true (the default), an __eq__() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If the class already defines __eq__(), this parameter is ignored. ``` Test Plan: # Result Typically, block_size is set to 16, so in production usage, we might likely allocate 10-1000 blocks. In this range, the optimization gave us up to ~1ms TTFT savings (the improvements are more significant on long inputs). |After|Before| |{F1980286936}|{F1980286941}| Rollback Plan: Reviewed By: CuiCoco Differential Revision: D78292345 Signed-off-by: Jialin Ouyang <jialino@meta.com>

Signed-off-by: Jialin Ouyang <jialino@meta.com>

JialinOuyang-Meta requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners July 15, 2025 17:37

mergify bot added performance Performance-related issues v1 labels Jul 15, 2025

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

vllm/v1/core/kv_cache_utils.py Outdated Show resolved Hide resolved

vllm/v1/core/kv_cache_utils.py Show resolved Hide resolved

JialinOuyang-Meta force-pushed the export-D78292345 branch from 574454a to 53e5f05 Compare July 15, 2025 17:49

JialinOuyang-Meta force-pushed the export-D78292345 branch from 53e5f05 to f0a6c84 Compare July 15, 2025 17:49

JialinOuyang-Meta force-pushed the export-D78292345 branch from f0a6c84 to 36411c3 Compare July 15, 2025 17:54

JialinOuyang-Meta changed the title ~~Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue~~ [Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue Jul 15, 2025

JialinOuyang-Meta force-pushed the export-D78292345 branch from 36411c3 to 608f1f5 Compare July 15, 2025 18:32

JialinOuyang-Meta added 2 commits July 15, 2025 12:33

Address precommit errors

13254b4

Signed-off-by: Jialin Ouyang <jialino@meta.com>

Address precommit failures

b061697

Signed-off-by: Jialin Ouyang <jialino@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Avoid KVCacheBlock.eq invocations in FreeKVCacheBlockQueue #21005

[Core] Avoid KVCacheBlock.eq invocations in FreeKVCacheBlockQueue #21005

JialinOuyang-Meta commented Jul 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue #21005

Are you sure you want to change the base?

[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue #21005

Conversation

JialinOuyang-Meta commented Jul 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optimizations

Observations

Result

Benchmark

Stack

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

Uh oh!

[Core] Avoid KVCacheBlock.eq invocations in FreeKVCacheBlockQueue #21005

[Core] Avoid KVCacheBlock.eq invocations in FreeKVCacheBlockQueue #21005

JialinOuyang-Meta commented Jul 15, 2025 •

edited by github-actions bot

Loading