Fix the device error when using ray and add initialize_cache to support vllm main #884

zhuo97 · 2025-05-16T07:14:32Z

Remove RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES
Add lazy init for vllm_ascend_C

wangxiyuan · 2025-05-16T08:34:02Z

vllm_ascend/ops/rotary_embedding.py



 def custom_rotary_embedding_enabled(query, neox_style, head_size):
-    return query.dtype == torch.float16 and neox_style and head_size % 32 == 0 and CUSTOM_OP_ENABLED
+    try_register_lib("vllm_ascend.vllm_ascend_C")


CI failed. I guess the problem is here. for rotary_embedding test, this func should be called as well.

Sorry, I forgot to modify test_rotary_embedding.py to use try_register_lib("vllm_ascend.vllm_ascend_C") instead of import vllm_ascend.platform.

wangxiyuan · 2025-05-17T02:47:02Z

CI failure has been fixed. Please rebase main again. Thanks.

wangxiyuan · 2025-05-18T01:49:28Z

tests/ops/test_rotary_embedding.py

-import vllm_ascend.platform  # noqa: F401
+from vllm_ascend.utils import try_register_lib
+
+try_register_lib("vllm_ascend.vllm_ascend_C")


please pass lib_info as well to keep the same as before:
Failed to register custom ops, all custom ops will be disabled

OK, the information is added.

github-actions · 2025-06-03T09:39:55Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-06T12:23:25Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MengqingCao · 2025-06-16T04:50:46Z

vllm_ascend/platform.py

@@ -33,14 +32,6 @@
 from vllm_ascend.utils import ASCEND_QUATIZATION_METHOD, update_aclgraph_sizes

 CUSTOM_OP_ENABLED = False


I think this could be removed?

Yes, this variable is removed.

MengqingCao · 2025-06-16T04:51:43Z

vllm_ascend/platform.py

@@ -50,7 +41,6 @@
    VllmConfig = None
    FlexibleArgumentParser = None

-os.environ["RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES"] = "1"


TODO: Add ut for ray backend. we should do this after this pr and #1235

MengqingCao · 2025-06-16T06:58:22Z

tests/singlecard/ops/test_rotary_embedding.py

+import vllm_ascend.platform as pf
+
+pf.CUSTOM_OP_ENABLED = True


Suggested change

import vllm_ascend.platform as pf

pf.CUSTOM_OP_ENABLED = True

import vllm_ascend.vllm_ascend_C

MengqingCao · 2025-06-16T07:00:16Z

vllm_ascend/utils.py

@@ -67,6 +67,22 @@ def try_register_lib(lib_name: str, lib_info: str = ""):
        pass


+def enable_custom_op():
+    CUSTOM_OP_ENABLED = False


Let's add a comment here to remind developers to import vllm_ascend.vllm_ascend_C in examples or UTs of custom op

ok, related comments are added.

MengqingCao · 2025-06-16T08:21:06Z

LGTM now, thanks!

github-actions · 2025-06-16T10:34:39Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan

Please rebase the main. the e2e test has been moved to tests/e2e folder already.

wangxiyuan · 2025-06-16T10:58:10Z

examples/offline_multi_step_custom_ops.py


-pf.CUSTOM_OP_ENABLED = True  # set True for custom Ops of Multi-Step.
+enable_custom_op()


I think this 2 lines can be removed.

OK, this 2 lines are removed.

wangxiyuan · 2025-06-16T10:59:22Z

vllm_ascend/utils.py

@@ -67,6 +67,26 @@ def try_register_lib(lib_name: str, lib_info: str = ""):
        pass


+def enable_custom_op():


make CUSTOM_OP_ENABLED as global var.

CUSTOM_OP_ENABLED = None def enable_custom_op(): global CUSTOM_OP_ENABLED if CUSTOM_OP_ENABLED is not None: return CUSTOM_OP_ENABLED else: xxxx

Ok, I have changed CUSTOM_OP_ENABLED to a global variable.

wangxiyuan · 2025-06-16T11:44:42Z

CI is broken due the vllm change: vllm-project/vllm@1173804#diff-3dd8e96bc7c1aaf28faa13b3b705f4f7bdbf755aec34dcf4d9b67c933ddfb127

Can you fix it in this PR as well?

…or vllm_ascend_C Signed-off-by: zhuo97 <1103045176@qq.com>

…ect#884) 1. Remove RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES 2. Add lazy init for vllm_ascend_C Signed-off-by: zhuo97 <1103045176@qq.com>

…ect#884) 1. Remove RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES 2. Add lazy init for vllm_ascend_C Signed-off-by: zhuo97 <1103045176@qq.com> Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com>

@jianzs

…to main * 'main' of https://github.com/vllm-project/vllm-ascend: (22 commits) [Bugfix] Remove cuda related lines and add additional pip mirror (vllm-project#1252) [refactor] Refactoring AscendFusedMoE (vllm-project#1229) [Doc] Refactor and init user story page (vllm-project#1224) [Doctest] add installation doctest (vllm-project#1179) [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (vllm-project#1235) Fix the device error when using ray as vllm-acend backend (vllm-project#884) [CI] Add unit test framework (vllm-project#1201) [Build] Speedup image build (vllm-project#1216) [CI] Make e2e test to be preemptible and simple (vllm-project#1217) Waiting for BMM NZ support(Improve TPOP 2ms performance) (vllm-project#1131) [Doc] fix VLLM_USE_V1 value in graph mode docs (vllm-project#1226) vllm-ascend support chunked prefill (vllm-project#1172) [CI/UT][Graph] Add ut for torchair graph mode (vllm-project#1103) Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer (vllm-project#1203) [CI] Recover ut for ascend scheduler only in ci of v1. (vllm-project#1180) Support multistream of MLA vector operations (vllm-project#1135) [Doc] Add Referer header for CANN package download url. (vllm-project#1192) [fix] fix bug in 1p1d disaggregated_prefill example (vllm-project#1184) [CI][Benchmark] Add qwen2.5-7b test (vllm-project#1104) [CI][Benchmark] Add new model and v1 test to perf benchmarks (vllm-project#1099) ... Sync with upstream main branch# the commit.

github-actions bot added module:ops module:core labels May 16, 2025

zhuo97 force-pushed the main branch 3 times, most recently from f085e48 to f71a8dd Compare May 16, 2025 07:35

wangxiyuan requested changes May 16, 2025

View reviewed changes

zhuo97 force-pushed the main branch 2 times, most recently from d4c5e5f to 7d9155d Compare May 16, 2025 09:39

github-actions bot added the module:tests label May 16, 2025

zhuo97 force-pushed the main branch from 7d9155d to ff1c83e Compare May 16, 2025 09:47

zhuo97 force-pushed the main branch from ff1c83e to 02d65f9 Compare May 17, 2025 03:53

wangxiyuan approved these changes May 17, 2025

View reviewed changes

wangxiyuan requested changes May 18, 2025

View reviewed changes

zhuo97 force-pushed the main branch 2 times, most recently from 3023c18 to 970f649 Compare May 19, 2025 01:50

zhuo97 mentioned this pull request May 21, 2025

[Bug]: Multi vllm engine and tp > 1 will cause device error #322

Closed

github-actions bot added the merge-conflicts label Jun 3, 2025

zhuo97 force-pushed the main branch from 970f649 to 38a7539 Compare June 5, 2025 02:30

github-actions bot removed the merge-conflicts label Jun 5, 2025

zhuo97 force-pushed the main branch 2 times, most recently from 8130d67 to 278532c Compare June 5, 2025 02:49

github-actions bot added the merge-conflicts label Jun 6, 2025

zhuo97 force-pushed the main branch from 278532c to df773f0 Compare June 16, 2025 03:08

github-actions bot removed the merge-conflicts label Jun 16, 2025

Yikun mentioned this pull request Jun 16, 2025

Bump torch-npu version to 2.5.1.post1.dev20250528 #1232

Closed

MengqingCao mentioned this pull request Jun 16, 2025

[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 #1235

Merged

MengqingCao reviewed Jun 16, 2025

View reviewed changes

zhuo97 force-pushed the main branch from df773f0 to 33eae8d Compare June 16, 2025 07:46

MengqingCao mentioned this pull request Jun 16, 2025

Fix the device error when using ray as vllm-acend backend #1234

Merged

github-actions bot added the merge-conflicts label Jun 16, 2025

wangxiyuan reviewed Jun 16, 2025

View reviewed changes

zhuo97 force-pushed the main branch 2 times, most recently from 3589b0e to 17efb65 Compare June 16, 2025 11:16

github-actions bot removed the merge-conflicts label Jun 16, 2025

wangxiyuan approved these changes Jun 16, 2025

View reviewed changes

rm RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES & add lazy init f…

bcac663

…or vllm_ascend_C Signed-off-by: zhuo97 <1103045176@qq.com>

zhuo97 force-pushed the main branch from 17efb65 to bcac663 Compare June 16, 2025 11:59

wangxiyuan merged commit f5404dc into vllm-project:main Jun 16, 2025
20 checks passed

Yikun changed the title ~~Fix the device error when using ray as vllm-acend backend~~ Fix the device error when using ray and add initialize_cache to support vllm main Jun 17, 2025

Yikun mentioned this pull request Jun 17, 2025

[cherry-pick][0.9.1] rebase main #1250

Merged

MengqingCao mentioned this pull request Jun 17, 2025

[Bug]: Stuck at "Adjusting world_size=16 rank=0 distributed_init_method=tcp://127.0.0.1:42051 for DP" #1255

Open

		@@ -33,14 +32,6 @@
		from vllm_ascend.utils import ASCEND_QUATIZATION_METHOD, update_aclgraph_sizes

		CUSTOM_OP_ENABLED = False

		import vllm_ascend.platform as pf

		pf.CUSTOM_OP_ENABLED = True


		pf.CUSTOM_OP_ENABLED = True # set True for custom Ops of Multi-Step.
		enable_custom_op()

		@@ -67,6 +67,26 @@ def try_register_lib(lib_name: str, lib_info: str = ""):
		pass


		def enable_custom_op():

Fix the device error when using ray and add initialize_cache to support vllm main #884

Fix the device error when using ray and add initialize_cache to support vllm main #884

Uh oh!

Conversation

zhuo97 commented May 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented May 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

MengqingCao Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MengqingCao commented Jun 16, 2025

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

wangxiyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

MengqingCao Jun 16, 2025 •

edited

Loading