[1/N][UT][v1 MTP] add basic v1 mtp features #890

XWFAlone · 2025-05-17T09:50:40Z

What this PR does / why we need it?

add basic v1 mtp features
please merge it after #874 and #844.

Does this PR introduce any user-facing change?

now, we supported basic v1 mtp, only supported tp only、eager mode and k=1
we will continue to expand more scenarios.

How was this patch tested?

local tested

vllm_ascend/patch/platform/patch_common/__init__.py

tests/singlecard/spec_decode/e2e/test_v1_mtp_correctness.py

vllm_ascend/patch/worker/patch_common/patch_v1_mtp_proposer.py

vllm_ascend/patch/platform/patch_common/patch_config.py

vllm_ascend/patch/platform/patch_common/patch_arg_utils.py

mengwei805 · 2025-05-23T12:43:44Z

vllm_ascend/patch/platform/patch_main/patch_arg_utils.py

@@ -0,0 +1,230 @@
+import threading


pls add patch desciption in init

JC-ut0 · 2025-05-23T13:30:40Z

vllm_ascend/worker/model_runner_v1.py

@@ -228,7 +232,7 @@ def __init__(self, vllm_config: VllmConfig, device: torch.device):
        self.requests: Dict[str, CachedRequestState] = {}
        # Persistent batch.
        # Remove this after we drop 0.8.5 support
-        if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"):
+        if vllm_version_is("0.8.5") or ("0.8.5.post1"):


this if statement will always be True, change this back to the last version

Suggested change

if vllm_version_is("0.8.5") or ("0.8.5.post1"):

if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"):

mengwei805 · 2025-05-27T06:51:43Z

tests/singlecard/spec_decode/e2e/test_v1_mtp_correctness.py

+import pytest
+from vllm import LLM, SamplingParams
+
+os.environ['VLLM_USE_MODELSCOPE'] = 'True'


if add this env, u should make a single progress in CI to avoid affecting other cases in the same progress that do not use modelscope；
You can also clear this environment variable after the script is executed. In short, make sure that this environment variable is only valid for this file.

mengwei805 · 2025-05-27T06:53:06Z

vllm_ascend/attention/mla_v1.py


 from vllm_ascend.attention.attention_v1 import AscendAttentionState
 from vllm_ascend.ops.attention import vanilla_chunked_prefill_mla
-from vllm_ascend.utils import vllm_version_is
+from vllm_ascend.utils import vllm_major_version_is, vllm_version_is


Why add this version judgment function? Please explain

mengwei805 · 2025-05-27T06:56:35Z

vllm_ascend/attention/mla_v1.py

        # Convert from (L, N, P) to (N, P, L)
-        self.W_UK_T = W_UK.permute(1, 2, 0).contiguous()
-        self.W_UV.data = torch_npu.npu_format_cast(self.W_UV.data, 29)
-        self.W_UK_T.data = torch_npu.npu_format_cast(self.W_UK_T.data, 29)


Why make this change?

mengwei805 · 2025-05-27T10:59:59Z

pls rebase u all commits to 1 commit

wangxiyuan · 2025-05-29T03:45:54Z

you can rebase now. The CI error is fixed

wangxiyuan · 2025-05-23T13:18:40Z

vllm_ascend/worker/model_runner_v1.py

@@ -228,7 +232,7 @@ def __init__(self, vllm_config: VllmConfig, device: torch.device):
        self.requests: Dict[str, CachedRequestState] = {}
        # Persistent batch.
        # Remove this after we drop 0.8.5 support
-        if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"):
+        if vllm_version_is("0.8.5") or ("0.8.5.post1"):


this change is not work.

wangxiyuan · 2025-05-29T12:35:33Z

tests/long_term/spec_decode/e2e/test_v1_mtp_correctness.py

@@ -0,0 +1,92 @@
+from __future__ import annotations


why add this import

avoiding circular reference problems with type annotations

Co-authored-by: XWFAlone <xuewenfei2@huawei.com> Co-authored-by: mengwei805 <mengwei25@huawei.com> Co-authored-by: JC-ut0 <xuyexiong@huawei.com> Signed-off-by: XWFAlone <xuewenfei2@huawei.com>

### What this PR does / why we need it? add basic v1 mtp features please merge it after vllm-project#874 and vllm-project#844. ### Does this PR introduce _any_ user-facing change? now, we supported basic v1 mtp, only supported tp only、eager mode and k=1 we will continue to expand more scenarios. ### How was this patch tested? local tested Signed-off-by: XWFAlone <xuewenfei2@huawei.com> Co-authored-by: mengwei805 <mengwei25@huawei.com> Co-authored-by: JC-ut0 <xuyexiong@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>

### What this PR does / why we need it? add basic v1 mtp features please merge it after vllm-project#874 and vllm-project#844. ### Does this PR introduce _any_ user-facing change? now, we supported basic v1 mtp, only supported tp only、eager mode and k=1 we will continue to expand more scenarios. ### How was this patch tested? local tested Signed-off-by: XWFAlone <xuewenfei2@huawei.com> Co-authored-by: mengwei805 <mengwei25@huawei.com> Co-authored-by: JC-ut0 <xuyexiong@huawei.com>

github-actions bot added the module:tests label May 17, 2025

XWFAlone force-pushed the v1_mtp branch 3 times, most recently from 4a85243 to 7d3ca5a Compare May 17, 2025 10:33

JC-ut0 reviewed May 20, 2025

View reviewed changes

vllm_ascend/patch/platform/patch_common/__init__.py Outdated Show resolved Hide resolved

XWFAlone force-pushed the v1_mtp branch from 7d3ca5a to 2c3a5de Compare May 23, 2025 08:15

github-actions bot added the module:ops label May 23, 2025

XWFAlone force-pushed the v1_mtp branch from 2c3a5de to df0f268 Compare May 23, 2025 09:22

mengwei805 reviewed May 23, 2025

View reviewed changes

tests/singlecard/spec_decode/e2e/test_v1_mtp_correctness.py Outdated Show resolved Hide resolved

vllm_ascend/patch/worker/patch_common/patch_v1_mtp_proposer.py Outdated Show resolved Hide resolved

vllm_ascend/patch/platform/patch_common/patch_config.py Outdated Show resolved Hide resolved

mengwei805 reviewed May 23, 2025

View reviewed changes

vllm_ascend/patch/platform/patch_common/patch_arg_utils.py Outdated Show resolved Hide resolved

XWFAlone force-pushed the v1_mtp branch 2 times, most recently from 721b02d to edaf563 Compare May 23, 2025 12:34

mengwei805 reviewed May 23, 2025

View reviewed changes

XWFAlone force-pushed the v1_mtp branch from edaf563 to d2125e1 Compare May 23, 2025 12:57

JC-ut0 reviewed May 23, 2025

View reviewed changes

github-actions bot added the module:core label May 26, 2025

XWFAlone force-pushed the v1_mtp branch 2 times, most recently from 27f8f0f to 5771f55 Compare May 27, 2025 02:05

mengwei805 reviewed May 27, 2025

View reviewed changes

github-actions bot removed the module:core label May 27, 2025

XWFAlone force-pushed the v1_mtp branch 3 times, most recently from 2a7968b to fb0db2b Compare May 28, 2025 01:54

mengwei805 added long-term-test enable long term test for PR ready-for-test start test by label for PR labels May 28, 2025

XWFAlone force-pushed the v1_mtp branch from 2df477e to 8915868 Compare May 28, 2025 03:30

mengwei805 added the ready read for review label May 28, 2025

XWFAlone force-pushed the v1_mtp branch from 8915868 to 0707948 Compare May 28, 2025 03:51

mengwei805 removed the ready read for review label May 28, 2025

XWFAlone force-pushed the v1_mtp branch 2 times, most recently from 7af2e72 to 56f8efb Compare May 29, 2025 02:08

XWFAlone force-pushed the v1_mtp branch from 56f8efb to 1b4fd6d Compare May 29, 2025 03:49

wangxiyuan added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels May 29, 2025

wangxiyuan approved these changes May 29, 2025

View reviewed changes

XWFAlone force-pushed the v1_mtp branch from 1b4fd6d to d55aad8 Compare May 29, 2025 06:28