Skip to content

Commit 7153d88

Browse files
authored
[Feature] Impl v1 disaggregated prefill in ascend scheduler (#852)
Implement save kv cache logic for v1 disaggregated prefill in ascend scheduler This PR adds support for saving kv cache in the ascend scheduler, which is part of the v1 disaggregated prefill design. The load functionality is not yet implemented. Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
1 parent b434f37 commit 7153d88

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

vllm_ascend/core/scheduler.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,11 @@ def __init__(
5151
self.scheduled_req_ids: set[str] = set()
5252
self.running: list[Request] = []
5353

54+
if self.vllm_config.kv_transfer_config is not None and \
55+
self.vllm_config.kv_transfer_config.is_kv_consumer:
56+
raise ValueError(
57+
"AscendScheduler cannot be used for decode nodes. ")
58+
5459
def schedule(self) -> SchedulerOutput:
5560
if self.scheduler_config.chunked_prefill_enabled:
5661
return super().schedule()
@@ -287,6 +292,14 @@ def skip_cur_request():
287292
grammar_bitmask=None,
288293
)
289294

295+
# NOTE(Kuntai): this function is designed for multiple purposes:
296+
# 1. Plan the KV cache store
297+
# 2. Wrap up all the KV cache load / save ops into an opaque object
298+
# 3. Clear the internal states of the connector
299+
if self.connector is not None:
300+
meta = self.connector.build_connector_meta(scheduler_output)
301+
scheduler_output.kv_connector_metadata = meta
302+
290303
# Advance the number of computed tokens for the request AFTER
291304
# the request is scheduled.
292305
# 1. The scheduler_output of the current step has to include the

0 commit comments

Comments
 (0)