You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO 04-28 05:12:09 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 04-28 05:12:09 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 04-28 05:12:09 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 04-28 05:12:09 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:09 [__init__.py:44] plugin ascend loaded.
INFO 04-28 05:12:09 [__init__.py:230] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 04-28 05:12:09 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 04-28 05:12:09 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 04-28 05:12:09 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 04-28 05:12:09 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:09 [__init__.py:44] plugin ascend loaded.
INFO 04-28 05:12:09 [__init__.py:230] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 04-28 05:12:11 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 04-28 05:12:11 [__init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 04-28 05:12:11 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 04-28 05:12:11 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:11 [__init__.py:44] plugin ascend_enhanced_model loaded.
INFO 04-28 05:12:11 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:11 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 04-28 05:12:11 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 04-28 05:12:11 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 04-28 05:12:11 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 04-28 05:12:11 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 04-28 05:12:11 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 04-28 05:12:11 [__init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 04-28 05:12:11 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 04-28 05:12:11 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:11 [__init__.py:44] plugin ascend_enhanced_model loaded.
INFO 04-28 05:12:11 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:12 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 04-28 05:12:12 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 04-28 05:12:12 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 04-28 05:12:12 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 04-28 05:12:12 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 04-28 05:12:22 [config.py:689] This model supports multiple tasks: {'score', 'generate', 'classify', 'embed', 'reward'}. Defaulting to 'generate'.
WARNING 04-28 05:12:22 [arg_utils.py:1731] --kv-transfer-config is not supported by the V1 Engine. Falling back to V0.
INFO 04-28 05:12:22 [config.py:1713] Defaulting to use mp for distributed inference
INFO 04-28 05:12:22 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
WARNING 04-28 05:12:22 [platform.py:129] NPU compilation support pending. Will be available in future CANN and torch_npu releases. Using default: enforce_eager=True
INFO 04-28 05:12:22 [platform.py:134] Compilation disabled, using eager mode by default
INFO 04-28 05:12:22 [llm_engine.py:243] Initializing a V0 LLM engine (v0.8.4) with config: model='/models/Qwen2-7B-Instruct', speculative_config=None, tokenizer='/models/Qwen2-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2000, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/models/Qwen2-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":0,"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
INFO 04-28 05:12:22 [config.py:689] This model supports multiple tasks: {'score', 'generate', 'classify', 'embed', 'reward'}. Defaulting to 'generate'.
WARNING 04-28 05:12:22 [arg_utils.py:1731] --kv-transfer-config is not supported by the V1 Engine. Falling back to V0.
INFO 04-28 05:12:22 [config.py:1713] Defaulting to use mp for distributed inference
INFO 04-28 05:12:22 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
WARNING 04-28 05:12:22 [platform.py:129] NPU compilation support pending. Will be available in future CANN and torch_npu releases. Using default: enforce_eager=True
INFO 04-28 05:12:22 [platform.py:134] Compilation disabled, using eager mode by default
INFO 04-28 05:12:22 [llm_engine.py:243] Initializing a V0 LLM engine (v0.8.4) with config: model='/models/Qwen2-7B-Instruct', speculative_config=None, tokenizer='/models/Qwen2-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2000, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/models/Qwen2-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":0,"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
WARNING 04-28 05:12:22 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 192 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:22 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks
ERROR 04-28 05:12:22 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:22 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
INFO 04-28 05:12:22 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:22 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 04-28 05:12:22 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:22 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
(VllmWorkerProcess pid=18200) WARNING 04-28 05:12:22 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:22 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
WARNING 04-28 05:12:23 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 192 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:23 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks
ERROR 04-28 05:12:23 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:23 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
INFO 04-28 05:12:23 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:23 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:23 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
(VllmWorkerProcess pid=18268) WARNING 04-28 05:12:23 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 04-28 05:12:23 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:23 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
WARNING 04-28 05:12:23 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda1904f0>
(VllmWorkerProcess pid=18200) WARNING 04-28 05:12:23 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda18c250>
(VllmWorkerProcess pid=18268) WARNING 04-28 05:12:24 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda18c1c0>
WARNING 04-28 05:12:24 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda190460>
[rank1]:[W428 05:12:30.777485493 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank0]:[W428 05:12:30.778197299 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 04-28 05:12:30 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_8211b83c'), local_subscribe_addr='ipc:///tmp/d3502128-3222-4794-a17f-185139b45d4e', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:30 [parallel_state.py:959] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
[rank1]:[W428 05:12:30.931797872 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank0]:[W428 05:12:30.938525886 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 04-28 05:12:30 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_00d5acb5'), local_subscribe_addr='ipc:///tmp/31ca084d-c176-4202-a16c-0a04f3b96f29', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:30 [parallel_state.py:959] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
INFO 04-28 05:12:30 [parallel_state.py:959] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 04-28 05:12:30 [parallel_state.py:959] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
decode: local rank 0
options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '0'}
(VllmWorkerProcess pid=18200) decode: local rank 1
(VllmWorkerProcess pid=18200) options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1'}
(VllmWorkerProcess pid=18268) prompt: local rank 1, ip = 192.168.128.92
(VllmWorkerProcess pid=18268) options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.listenIpInfo': '192.168.128.92:26000'}
prompt: local rank 0, ip = 192.168.128.91
options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '0', 'llm.listenIpInfo': '192.168.128.91:26000'}
INFO 04-28 05:12:41 [llmdatadist_connector.py:118] 0/2 rank data dist is ready
target ip = 192.168.128.91
INFO 04-28 05:12:41 [llmdatadist_connector.py:118] 0/2 rank data dist is ready
INFO 04-28 05:12:41 [llmdatadist_connector.py:169] local_rank 0 link, ret=[<LLMStatusCode.LLM_SUCCESS: 0>]
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 2378, in run_method
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] return func(*args, **kwargs)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 604, in init_device
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self.worker.init_device() # type: ignore
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 192, in init_device
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self._init_worker_distributed_environment(self.vllm_config, self.rank,
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 521, in _init_worker_distributed_environment
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] ensure_kv_transfer_initialized(vllm_config)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 979, in ensure_kv_transfer_initialized
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] _KV_TRANSFER = kv_transfer.KVTransferAgent(
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_transfer_agent.py", line 49, in __init__
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self.connector = KVConnectorFactory.create_connector(
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 36, in create_connector
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] return connector_cls(rank, local_rank, config)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 162, in __init__
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self.llm_datadist_engine.prepare_data_dist()
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 116, in prepare_data_dist
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self.data_dist.init(options)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/v2/llm_datadist.py", line 284, in init
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] handle_llm_status(ret, '[LLMDataDist.init]', f'Failed to initialize llm datadist, options = {options}')
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/status.py", line 121, in handle_llm_status
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] raise LLMException(f"{func_name} failed, error code is {code_2_status(status)}, {other_info}.",
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.Role': 'Decoder', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_decoder_1.json'}.
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 2378, in run_method
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] return func(*args, **kwargs)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 604, in init_device
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self.worker.init_device() # type: ignore
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 192, in init_device
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self._init_worker_distributed_environment(self.vllm_config, self.rank,
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 521, in _init_worker_distributed_environment
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] ensure_kv_transfer_initialized(vllm_config)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 979, in ensure_kv_transfer_initialized
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] _KV_TRANSFER = kv_transfer.KVTransferAgent(
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_transfer_agent.py", line 49, in __init__
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self.connector = KVConnectorFactory.create_connector(
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 36, in create_connector
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] return connector_cls(rank, local_rank, config)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 160, in __init__
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self.llm_datadist_engine.prepare_data_dist()
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 116, in prepare_data_dist
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] self.data_dist.init(options)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/v2/llm_datadist.py", line 284, in init
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] handle_llm_status(ret, '[LLMDataDist.init]', f'Failed to initialize llm datadist, options = {options}')
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/status.py", line 121, in handle_llm_status
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] raise LLMException(f"{func_name} failed, error code is {code_2_status(status)}, {other_info}.",
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.listenIpInfo': '192.168.128.92:26000', 'llm.Role': 'Prompt', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"], "listen_ip_info": [{"ip": 1551935680, "port": 26000}]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_prompt_1.json'}.
INFO 04-28 05:12:44 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/python3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Process Process-2:
self.run()
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 84, in run
for result in iter(self.result_queue.get, _TERMINATE):
File "/usr/local/python3.10/lib/python3.10/multiprocessing/queues.py", line 103, in get
res = self._recv_bytes()
File "/usr/local/python3.10/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/local/python3.10/lib/python3.10/multiprocessing/connection.py", line 421, in _recv_bytes
return self._recv(size)
File "/usr/local/python3.10/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
TypeError: 'NoneType' object cannot be interpreted as an integer
Traceback (most recent call last):
File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/workspace/vllm-ascend/examples/offline_disaggregated_prefill_npu.py", line 93, in run_decode
llm = LLM(model="/models/Qwen2-7B-Instruct",
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 1099, in inner
return fn(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 248, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 522, in from_engine_args
return engine_cls.from_vllm_config(
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 498, in from_vllm_config
return cls(
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 282, in __init__
self.model_executor = executor_class(vllm_config=vllm_config, )
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 286, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
self._init_executor()
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor
self._run_workers("init_device")
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in _run_workers
] + [output.get() for output in worker_outputs]
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in <listcomp>
] + [output.get() for output in worker_outputs]
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 58, in get
raise self.result.exception
llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.Role': 'Decoder', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_decoder_1.json'}.
INFO 04-28 05:12:44 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
Process Process-1:
Traceback (most recent call last):
File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/workspace/vllm-ascend/examples/offline_disaggregated_prefill_npu.py", line 54, in run_prefill
llm = LLM(model="/models/Qwen2-7B-Instruct",
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 1099, in inner
return fn(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 248, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 522, in from_engine_args
return engine_cls.from_vllm_config(
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 498, in from_vllm_config
return cls(
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 282, in __init__
self.model_executor = executor_class(vllm_config=vllm_config, )
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 286, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
self._init_executor()
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor
self._run_workers("init_device")
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in _run_workers
] + [output.get() for output in worker_outputs]
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in <listcomp>
] + [output.get() for output in worker_outputs]
File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 58, in get
raise self.result.exception
llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.listenIpInfo': '192.168.128.92:26000', 'llm.Role': 'Prompt', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"], "listen_ip_info": [{"ip": 1551935680, "port": 26000}]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_prompt_1.json'}.
/usr/local/python3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
All process done!
It seems that DataDist cannot be initialized. But the options of initializing DataDist seems correct.
Here are the contents of the most recent log file under /root/ascend/log/debug/plog/:
As far as I know, the disaggregated prefill is currently not functioning. You can check the ongoing work on this issue; for v0, refer to #694 and #658, and for v1, refer to #684.
Your current environment
I am using vllm-v0.8.4 and vllm-ascend-v0.8.4rc1.
Environment:
🐛 Describe the bug
I am trying the PD seperation feature with model Qwen2-7B-Instruct and these commands, under
vllm-ascend
directory:Here are the logs and error messages:
It seems that DataDist cannot be initialized. But the options of initializing DataDist seems correct.
Here are the contents of the most recent log file under
/root/ascend/log/debug/plog/
:Could you please help us find out what is the problem? Or is there a bug in the PD seperation feature? Thank you.
The text was updated successfully, but these errors were encountered: