Skip to content

[Draft][WIP][Feature]cpu offload connector #1659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lidenghui1110
Copy link

@lidenghui1110 lidenghui1110 commented Jul 7, 2025

What this PR does / why we need it?

This PR implements cpu offload connector to enable NPU kv cache offload to host DRAM.
This PR depend on vllm changes with starting a metadata-server process. Metadata-server manages cpu_kv_cache and offers rpc functions for the connector to call. It is designed to support shared-kv-cache between DP EngineCore.
Code of metadata-server is on working, we are trying to implement it in vllm-ascend to avoid long-term pull request merge in vllm.

Does this PR introduce any user-facing change?

user enable cpu offload with following params

 --kv-transfer-config \
    '{
    "kv_connector":"CPUOffloadingConnector",
        "kv_connector_module_path": "vllm_ascend.distributed.kv_transfer.cpu_offloading_connector",
        "kv_role":"kv_both", "kv_connector_extra_config": {"swap_in_threshold": 0, "cpu_swap_space_gb": 800}
    }'

How was this patch tested?

@lidenghui1110 lidenghui1110 changed the title [draft][wip][feature]cpu offload connector [Draft][WIP][Feature]cpu offload connector Jul 7, 2025
Copy link

github-actions bot commented Jul 9, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant