Skip to content

Commit eb39054

Browse files
authored
[Performance] Disable JIT and nd2nz to improve performance for Altlas 300I series (#1591)
### What this PR does / why we need it? Since running on Altlas 300I Duo was initial supported after #1333 , this PR will disable the JIT compiler for the 310P and changed the data format to NZ for the weight in the vocabulary embedding and QKV projection layers, which help improving performance. See #1563 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test manually: #1591 (comment) Signed-off-by: Vincent Yuan <farawayboat@gmail.com>
1 parent dd22ac3 commit eb39054

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

vllm_ascend/worker/model_runner_v1.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,13 +89,17 @@
8989
else:
9090
xgr = LazyLoader("xgr", globals(), "xgrammar")
9191

92+
import torch_npu
9293
import vllm.envs as envs_vllm
9394

9495
import vllm_ascend.envs as envs_ascend
9596

9697
if vllm_version_is("0.9.1"):
9798
from vllm.v1.spec_decode.utils import is_spec_decode_supported
9899

100+
if is_310p():
101+
torch_npu.npu.set_compile_mode(jit_compile=False)
102+
99103

100104
@dataclass
101105
class GraphCaptureContext:
@@ -2007,6 +2011,18 @@ def load_model(self) -> None:
20072011

20082012
with DeviceMemoryProfiler() as m: # noqa: SIM117
20092013
self.model = get_model(vllm_config=self.vllm_config)
2014+
2015+
if is_310p():
2016+
from vllm.model_executor.layers.linear import (
2017+
MergedColumnParallelLinear, QKVParallelLinear,
2018+
RowParallelLinear)
2019+
for module in self.model.modules():
2020+
if isinstance(module,
2021+
(MergedColumnParallelLinear,
2022+
QKVParallelLinear, RowParallelLinear)):
2023+
module.weight.data = torch_npu.npu_format_cast(
2024+
module.weight.data, ACL_FORMAT_FRACTAL_NZ)
2025+
20102026
try:
20112027
# For version compatibility, remove this after we abort vllm v0.9.1 support
20122028
from vllm.model_executor.models.interfaces import \

0 commit comments

Comments
 (0)