-
Notifications
You must be signed in to change notification settings - Fork 257
[1/N][UT][v1 MTP] add basic v1 mtp features #890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4a85243
to
7d3ca5a
Compare
721b02d
to
edaf563
Compare
@@ -0,0 +1,230 @@ | |||
import threading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add patch desciption in init
@@ -228,7 +232,7 @@ def __init__(self, vllm_config: VllmConfig, device: torch.device): | |||
self.requests: Dict[str, CachedRequestState] = {} | |||
# Persistent batch. | |||
# Remove this after we drop 0.8.5 support | |||
if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"): | |||
if vllm_version_is("0.8.5") or ("0.8.5.post1"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if statement will always be True, change this back to the last version
if vllm_version_is("0.8.5") or ("0.8.5.post1"): | |
if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"): |
27f8f0f
to
5771f55
Compare
import pytest | ||
from vllm import LLM, SamplingParams | ||
|
||
os.environ['VLLM_USE_MODELSCOPE'] = 'True' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if add this env, u should make a single progress in CI to avoid affecting other cases in the same progress that do not use modelscope;
You can also clear this environment variable after the script is executed. In short, make sure that this environment variable is only valid for this file.
vllm_ascend/attention/mla_v1.py
Outdated
|
||
from vllm_ascend.attention.attention_v1 import AscendAttentionState | ||
from vllm_ascend.ops.attention import vanilla_chunked_prefill_mla | ||
from vllm_ascend.utils import vllm_version_is | ||
from vllm_ascend.utils import vllm_major_version_is, vllm_version_is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add this version judgment function? Please explain
vllm_ascend/attention/mla_v1.py
Outdated
# Convert from (L, N, P) to (N, P, L) | ||
self.W_UK_T = W_UK.permute(1, 2, 0).contiguous() | ||
self.W_UV.data = torch_npu.npu_format_cast(self.W_UV.data, 29) | ||
self.W_UK_T.data = torch_npu.npu_format_cast(self.W_UK_T.data, 29) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this change?
pls rebase u all commits to 1 commit |
2a7968b
to
fb0db2b
Compare
7af2e72
to
56f8efb
Compare
you can rebase now. The CI error is fixed |
@@ -228,7 +232,7 @@ def __init__(self, vllm_config: VllmConfig, device: torch.device): | |||
self.requests: Dict[str, CachedRequestState] = {} | |||
# Persistent batch. | |||
# Remove this after we drop 0.8.5 support | |||
if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"): | |||
if vllm_version_is("0.8.5") or ("0.8.5.post1"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change is not work.
@@ -0,0 +1,92 @@ | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add this import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoiding circular reference problems with type annotations
Co-authored-by: XWFAlone <xuewenfei2@huawei.com> Co-authored-by: mengwei805 <mengwei25@huawei.com> Co-authored-by: JC-ut0 <xuyexiong@huawei.com> Signed-off-by: XWFAlone <xuewenfei2@huawei.com>
### What this PR does / why we need it? add basic v1 mtp features please merge it after vllm-project#874 and vllm-project#844. ### Does this PR introduce _any_ user-facing change? now, we supported basic v1 mtp, only supported tp only、eager mode and k=1 we will continue to expand more scenarios. ### How was this patch tested? local tested Signed-off-by: XWFAlone <xuewenfei2@huawei.com> Co-authored-by: mengwei805 <mengwei25@huawei.com> Co-authored-by: JC-ut0 <xuyexiong@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
### What this PR does / why we need it? add basic v1 mtp features please merge it after vllm-project#874 and vllm-project#844. ### Does this PR introduce _any_ user-facing change? now, we supported basic v1 mtp, only supported tp only、eager mode and k=1 we will continue to expand more scenarios. ### How was this patch tested? local tested Signed-off-by: XWFAlone <xuewenfei2@huawei.com> Co-authored-by: mengwei805 <mengwei25@huawei.com> Co-authored-by: JC-ut0 <xuyexiong@huawei.com>
What this PR does / why we need it?
add basic v1 mtp features
please merge it after #874 and #844.
Does this PR introduce any user-facing change?
now, we supported basic v1 mtp, only supported tp only、eager mode and k=1
we will continue to expand more scenarios.
How was this patch tested?
local tested