[Bug]: Cannot use PD separation feature with v0.8.4rc1 #696

gudiandian · 2025-04-28T05:17:34Z

Your current environment

I am using vllm-v0.8.4 and vllm-ascend-v0.8.4rc1.
Environment:

PyTorch version: 2.5.1
Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.0
Libc version: glibc-2.35

Python version: 3.10.15 (main, Nov 27 2024, 06:51:55) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.0-182.0.0.95.oe2203sp3.aarch64-aarch64-with-glibc2.35

CPU:
Architecture:                       aarch64
CPU op-mode(s):                     64-bit
Byte Order:                         Little Endian
CPU(s):                             192
On-line CPU(s) list:                0-191
Vendor ID:                          HiSilicon
BIOS Vendor ID:                     HiSilicon
Model name:                         Kunpeng-920
BIOS Model name:                    HUAWEI Kunpeng 920 5250
Model:                              0
Thread(s) per core:                 1
Core(s) per socket:                 48
Socket(s):                          4
Stepping:                           0x1
Frequency boost:                    disabled
CPU max MHz:                        2600.0000
CPU min MHz:                        200.0000
BogoMIPS:                           200.00
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache:                          12 MiB (192 instances)
L1i cache:                          12 MiB (192 instances)
L2 cache:                           96 MiB (192 instances)
L3 cache:                           192 MiB (8 instances)
NUMA node(s):                       8
NUMA node0 CPU(s):                  0-23
NUMA node1 CPU(s):                  24-47
NUMA node2 CPU(s):                  48-71
NUMA node3 CPU(s):                  72-95
NUMA node4 CPU(s):                  96-119
NUMA node5 CPU(s):                  120-143
NUMA node6 CPU(s):                  144-167
NUMA node7 CPU(s):                  168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization
Vulnerability Spectre v2:           Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.4.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.dev20250320
[pip3] torchvision==0.20.1
[pip3] transformers==4.51.3
[conda] Could not collect
ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1


NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc3                 Version: 24.1.rc3                                             |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B3               | OK            | 97.3        45                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          3371 / 65536         |
+===========================+===============+====================================================+
| 1     910B3               | OK            | 94.5        45                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          3371 / 65536         |
+===========================+===============+====================================================+
| 2     910B3               | OK            | 88.4        44                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          3370 / 65536         |
+===========================+===============+====================================================+
| 3     910B3               | OK            | 94.0        46                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          3369 / 65536         |
+===========================+===============+====================================================+
| 4     910B3               | OK            | 95.0        46                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 5     910B3               | OK            | 94.2        47                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 6     910B3               | OK            | 93.6        46                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 7     910B3               | OK            | 99.3        48                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 2                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 3                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 4                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 5                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 6                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 7                                                            |
+===========================+===============+====================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.0.0
innerversion=V100R001C20SPC001B251
compatible_version=[V100R001C15],[V100R001C17],[V100R001C18],[V100R001C19],[V100R001C20]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.0.0/aarch64-linux

🐛 Describe the bug

I am trying the PD seperation feature with model Qwen2-7B-Instruct and these commands, under vllm-ascend directory:

export PROMPT_DEVICE_ID=0,1
export DECODE_DEVICE_ID=2,3
python examples/offline_disaggregated_prefill_npu.py

Here are the logs and error messages:

INFO 04-28 05:12:09 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 04-28 05:12:09 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 04-28 05:12:09 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 04-28 05:12:09 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:09 [__init__.py:44] plugin ascend loaded.
INFO 04-28 05:12:09 [__init__.py:230] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 04-28 05:12:09 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 04-28 05:12:09 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 04-28 05:12:09 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 04-28 05:12:09 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:09 [__init__.py:44] plugin ascend loaded.
INFO 04-28 05:12:09 [__init__.py:230] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 04-28 05:12:11 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 04-28 05:12:11 [__init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 04-28 05:12:11 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 04-28 05:12:11 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:11 [__init__.py:44] plugin ascend_enhanced_model loaded.
INFO 04-28 05:12:11 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:11 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 04-28 05:12:11 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 04-28 05:12:11 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 04-28 05:12:11 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 04-28 05:12:11 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 04-28 05:12:11 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 04-28 05:12:11 [__init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 04-28 05:12:11 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 04-28 05:12:11 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:11 [__init__.py:44] plugin ascend_enhanced_model loaded.
INFO 04-28 05:12:11 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:12 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 04-28 05:12:12 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 04-28 05:12:12 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 04-28 05:12:12 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 04-28 05:12:12 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 04-28 05:12:22 [config.py:689] This model supports multiple tasks: {'score', 'generate', 'classify', 'embed', 'reward'}. Defaulting to 'generate'.
WARNING 04-28 05:12:22 [arg_utils.py:1731] --kv-transfer-config is not supported by the V1 Engine. Falling back to V0.
INFO 04-28 05:12:22 [config.py:1713] Defaulting to use mp for distributed inference
INFO 04-28 05:12:22 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
WARNING 04-28 05:12:22 [platform.py:129] NPU compilation support pending. Will be available in future CANN and torch_npu releases. Using default: enforce_eager=True
INFO 04-28 05:12:22 [platform.py:134] Compilation disabled, using eager mode by default
INFO 04-28 05:12:22 [llm_engine.py:243] Initializing a V0 LLM engine (v0.8.4) with config: model='/models/Qwen2-7B-Instruct', speculative_config=None, tokenizer='/models/Qwen2-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2000, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/models/Qwen2-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":0,"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
INFO 04-28 05:12:22 [config.py:689] This model supports multiple tasks: {'score', 'generate', 'classify', 'embed', 'reward'}. Defaulting to 'generate'.
WARNING 04-28 05:12:22 [arg_utils.py:1731] --kv-transfer-config is not supported by the V1 Engine. Falling back to V0.
INFO 04-28 05:12:22 [config.py:1713] Defaulting to use mp for distributed inference
INFO 04-28 05:12:22 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
WARNING 04-28 05:12:22 [platform.py:129] NPU compilation support pending. Will be available in future CANN and torch_npu releases. Using default: enforce_eager=True
INFO 04-28 05:12:22 [platform.py:134] Compilation disabled, using eager mode by default
INFO 04-28 05:12:22 [llm_engine.py:243] Initializing a V0 LLM engine (v0.8.4) with config: model='/models/Qwen2-7B-Instruct', speculative_config=None, tokenizer='/models/Qwen2-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2000, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/models/Qwen2-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":0,"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
WARNING 04-28 05:12:22 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 192 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:22 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks
ERROR 04-28 05:12:22 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:22 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
INFO 04-28 05:12:22 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:22 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 04-28 05:12:22 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:22 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
(VllmWorkerProcess pid=18200) WARNING 04-28 05:12:22 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:22 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
WARNING 04-28 05:12:23 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 192 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:23 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks
ERROR 04-28 05:12:23 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:23 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
INFO 04-28 05:12:23 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:23 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:23 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
(VllmWorkerProcess pid=18268) WARNING 04-28 05:12:23 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 04-28 05:12:23 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:23 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
WARNING 04-28 05:12:23 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda1904f0>
(VllmWorkerProcess pid=18200) WARNING 04-28 05:12:23 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda18c250>
(VllmWorkerProcess pid=18268) WARNING 04-28 05:12:24 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda18c1c0>
WARNING 04-28 05:12:24 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda190460>
[rank1]:[W428 05:12:30.777485493 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank0]:[W428 05:12:30.778197299 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 04-28 05:12:30 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_8211b83c'), local_subscribe_addr='ipc:///tmp/d3502128-3222-4794-a17f-185139b45d4e', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:30 [parallel_state.py:959] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
[rank1]:[W428 05:12:30.931797872 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank0]:[W428 05:12:30.938525886 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 04-28 05:12:30 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_00d5acb5'), local_subscribe_addr='ipc:///tmp/31ca084d-c176-4202-a16c-0a04f3b96f29', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:30 [parallel_state.py:959] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
INFO 04-28 05:12:30 [parallel_state.py:959] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 04-28 05:12:30 [parallel_state.py:959] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
decode: local rank 0
options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '0'}
(VllmWorkerProcess pid=18200) decode: local rank 1
(VllmWorkerProcess pid=18200) options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1'}
(VllmWorkerProcess pid=18268) prompt: local rank 1, ip = 192.168.128.92
(VllmWorkerProcess pid=18268) options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.listenIpInfo': '192.168.128.92:26000'}
prompt: local rank 0, ip = 192.168.128.91
options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '0', 'llm.listenIpInfo': '192.168.128.91:26000'}
INFO 04-28 05:12:41 [llmdatadist_connector.py:118] 0/2 rank data dist is ready
target ip = 192.168.128.91
INFO 04-28 05:12:41 [llmdatadist_connector.py:118] 0/2 rank data dist is ready
INFO 04-28 05:12:41 [llmdatadist_connector.py:169] local_rank 0 link, ret=[<LLMStatusCode.LLM_SUCCESS: 0>]
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 2378, in run_method
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     return func(*args, **kwargs)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 604, in init_device
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.worker.init_device()  # type: ignore
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 192, in init_device
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self._init_worker_distributed_environment(self.vllm_config, self.rank,
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 521, in _init_worker_distributed_environment
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     ensure_kv_transfer_initialized(vllm_config)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 979, in ensure_kv_transfer_initialized
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     _KV_TRANSFER = kv_transfer.KVTransferAgent(
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_transfer_agent.py", line 49, in __init__
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.connector = KVConnectorFactory.create_connector(
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 36, in create_connector
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     return connector_cls(rank, local_rank, config)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 162, in __init__
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.llm_datadist_engine.prepare_data_dist()
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 116, in prepare_data_dist
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.data_dist.init(options)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/v2/llm_datadist.py", line 284, in init
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     handle_llm_status(ret, '[LLMDataDist.init]', f'Failed to initialize llm datadist, options = {options}')
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/status.py", line 121, in handle_llm_status
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     raise LLMException(f"{func_name} failed, error code is {code_2_status(status)}, {other_info}.",
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.Role': 'Decoder', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_decoder_1.json'}.
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 2378, in run_method
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     return func(*args, **kwargs)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 604, in init_device
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.worker.init_device()  # type: ignore
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 192, in init_device
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self._init_worker_distributed_environment(self.vllm_config, self.rank,
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 521, in _init_worker_distributed_environment
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     ensure_kv_transfer_initialized(vllm_config)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 979, in ensure_kv_transfer_initialized
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     _KV_TRANSFER = kv_transfer.KVTransferAgent(
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_transfer_agent.py", line 49, in __init__
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.connector = KVConnectorFactory.create_connector(
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 36, in create_connector
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     return connector_cls(rank, local_rank, config)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 160, in __init__
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.llm_datadist_engine.prepare_data_dist()
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 116, in prepare_data_dist
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.data_dist.init(options)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/v2/llm_datadist.py", line 284, in init
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     handle_llm_status(ret, '[LLMDataDist.init]', f'Failed to initialize llm datadist, options = {options}')
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/status.py", line 121, in handle_llm_status
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     raise LLMException(f"{func_name} failed, error code is {code_2_status(status)}, {other_info}.",
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.listenIpInfo': '192.168.128.92:26000', 'llm.Role': 'Prompt', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"], "listen_ip_info": [{"ip": 1551935680, "port": 26000}]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_prompt_1.json'}.
INFO 04-28 05:12:44 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Process Process-2:
    self.run()
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 84, in run
    for result in iter(self.result_queue.get, _TERMINATE):
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/queues.py", line 103, in get
    res = self._recv_bytes()
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/connection.py", line 421, in _recv_bytes
    return self._recv(size)
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
TypeError: 'NoneType' object cannot be interpreted as an integer
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/vllm-ascend/examples/offline_disaggregated_prefill_npu.py", line 93, in run_decode
    llm = LLM(model="/models/Qwen2-7B-Instruct",
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 1099, in inner
    return fn(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 248, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 522, in from_engine_args
    return engine_cls.from_vllm_config(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 498, in from_vllm_config
    return cls(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 282, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor
    self._run_workers("init_device")
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in _run_workers
    ] + [output.get() for output in worker_outputs]
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in <listcomp>
    ] + [output.get() for output in worker_outputs]
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 58, in get
    raise self.result.exception
llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.Role': 'Decoder', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_decoder_1.json'}.
INFO 04-28 05:12:44 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/vllm-ascend/examples/offline_disaggregated_prefill_npu.py", line 54, in run_prefill
    llm = LLM(model="/models/Qwen2-7B-Instruct",
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 1099, in inner
    return fn(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 248, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 522, in from_engine_args
    return engine_cls.from_vllm_config(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 498, in from_vllm_config
    return cls(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 282, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor
    self._run_workers("init_device")
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in _run_workers
    ] + [output.get() for output in worker_outputs]
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in <listcomp>
    ] + [output.get() for output in worker_outputs]
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 58, in get
    raise self.result.exception
llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.listenIpInfo': '192.168.128.92:26000', 'llm.Role': 'Prompt', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"], "listen_ip_info": [{"ip": 1551935680, "port": 26000}]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_prompt_1.json'}.
/usr/local/python3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
All process done!

It seems that DataDist cannot be initialized. But the options of initializing DataDist seems correct.
Here are the contents of the most recent log file under /root/ascend/log/debug/plog/:

[ERROR] GE(10280,python):2025-04-28-04:18:56.100.107 [subprocess_manager.cc:181]11592 operator(): ErrorNo: 4294967295(failed) Unsupported process type: queue_schedule
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.139 [flowgw_client.cc:69]11639 InnerStartFlowGw: ErrorNo: 4294967295(failed) Failed to fork flowgw.
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.146 [flowgw_client.cc:178]11639 StartFlowGw: ErrorNo: 4294967295(failed) Failed to start flowgw, device_id = 0, memory group = DM_QS_GROUP_10280.
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.151 [flowgw_client.cc:185]11639 Initialize: ErrorNo: 4294967295(failed) Failed to start flowgw client, device_id = 0, is_proxy = 0.
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.156 [flowgw_client_manager.cc:44]11639 GetOrCreateClient: ErrorNo: 4294967295(failed) Failed to initialize flowgw client.
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.235 [deploy_context.cc:1291]11639 InitProcessResource: ErrorNo: 4294967295(failed) [Check][Param:client]null is invalid
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.889 [deployer_service_impl.cc:170]11639 Process: ErrorNo: 4294967295(failed) [Process][Request] failed, client = local_context_10280, type = kInitProcessResource, ret = 4294967295
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.905 [deployer_proxy.cc:104]11639 SendRequest: ErrorNo: 4294967295(failed) Failed to process request, node_id = 0, request type = 9, error code = 4294967295, node_type = local, client_id = -1.
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.946 [master_model_deployer.cc:580]11639 InitProcessResourceRequest: ErrorNo: 4294967295(failed) [InitProcessResource] failed, node id=0, error code=4294967295, error message=Failed to deploy rank table..
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.955 [master_model_deployer.cc:240]11639 operator(): ErrorNo: 4294967295(failed) Failed to deploy rank table
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.978 [master_model_deployer.cc:246]11612 InitFlowGwInfo: ErrorNo: 4294967295(failed) [LOAD][LOAD]Failed to deploy rank table.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.301 [master_model_deployer.cc:409]11612 DeployModel: ErrorNo: 4294967295(failed) [LOAD][LOAD]
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.335 [model_executor.cc:1182]11612 FlowModelLoad: ErrorNo: 4294967295(failed) [LOAD][LOAD]Failed to load model: LlmGraph_1
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.342 [model_executor.cc:695]11612 ModelLoad: ErrorNo: 4294967295(failed) [LOAD][LOAD][Load][ModelOnline] Failed, model_id:4294967295
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.350 [graph_manager.cc:1150]11612 StartForRunGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD][Load][Graph] Failed, graph_id:0.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.355 [graph_manager.cc:1734]11612 BuildGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD][Call][StartForRunGraph] failed! graph_id:0.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.360 [inner_session.cc:832]11612 BuildGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD][Build][Graph] failed, InnerSession:1 graph_id=0.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.466 [ge_api.cc:1012]11612 BuildGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD]Build graph failed, error code:4294967295, session_id:1, graph_id:0.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.475 [flow_graph_manager.cc:915]11612 BuildFlowGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD][PromptManager] BuildForFlowGraph failed.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.485 [llm_flow_service.cc:64]11612 LoadFlowFuncs: ErrorNo: 4294967295(failed) [LOAD][LOAD]
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.501 [llm_flow_service.cc:263]11612 operator(): ErrorNo: 4294967295(failed) [LOAD][LOAD]
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.535 [llm_flow_service.cc:281]10280 GetAndCheckFutures: ErrorNo: 4294967295(failed) [INIT][DEFAULT]execute failed, device index = 0
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.546 [llm_flow_service.cc:288]10280 GetAndCheckFutures: ErrorNo: 4294967295(failed) [INIT][DEFAULT]Failed to execute in all devices, success/total = 0/1
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.551 [llm_flow_service.cc:268]10280 LoadDataFlow: ErrorNo: 4294967295(failed) [INIT][DEFAULT]Failed to load data flow in all devices
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.686 [llm_flow_service.cc:175]10280 Initialize: ErrorNo: 4294967295(failed) [INIT][DEFAULT]
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.699 [llm_datadist.cc:244]10280 InitFlowService: ErrorNo: 4294967295(failed) [INIT][DEFAULT]Failed to initialize LLM flow service
[ERROR] GE(10280,python):2025-04-28-04:19:04.874.662 [llm_datadist.cc:100]10280 LLMDataDistInitialize: ErrorNo: 4294967295(failed) [INIT][DEFAULT][Manager] initialize failed.
[ERROR] GE(10280,python):2025-04-28-04:19:04.874.674 [llm_datadist_wrapper.cc:156]10280 Init: ErrorNo: 4294967295(failed) [INIT][DEFAULT]

Could you please help us find out what is the problem? Or is there a bug in the PD seperation feature? Thank you.

The text was updated successfully, but these errors were encountered:

jianzs · 2025-04-28T07:14:23Z

As far as I know, the disaggregated prefill is currently not functioning. You can check the ongoing work on this issue; for v0, refer to #694 and #658, and for v1, refer to #684.

Yikun · 2025-05-16T01:44:51Z

We track P/D Disaggregation Support here.

gudiandian added the bug Something isn't working label Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Cannot use PD separation feature with v0.8.4rc1 #696

[Bug]: Cannot use PD separation feature with v0.8.4rc1 #696

gudiandian commented Apr 28, 2025

jianzs commented Apr 28, 2025 •

edited

Loading

Uh oh!

Yikun commented May 16, 2025

Uh oh!

[Bug]: Cannot use PD separation feature with v0.8.4rc1 #696

[Bug]: Cannot use PD separation feature with v0.8.4rc1 #696

Comments

gudiandian commented Apr 28, 2025

Your current environment

🐛 Describe the bug

jianzs commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yikun commented May 16, 2025

Uh oh!

jianzs commented Apr 28, 2025 •

edited

Loading