Skip to content

[Bug]: Cannot use PD separation feature with v0.8.4rc1 #696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gudiandian opened this issue Apr 28, 2025 · 2 comments
Open

[Bug]: Cannot use PD separation feature with v0.8.4rc1 #696

gudiandian opened this issue Apr 28, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@gudiandian
Copy link

Your current environment

I am using vllm-v0.8.4 and vllm-ascend-v0.8.4rc1.
Environment:

PyTorch version: 2.5.1
Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.0
Libc version: glibc-2.35

Python version: 3.10.15 (main, Nov 27 2024, 06:51:55) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.0-182.0.0.95.oe2203sp3.aarch64-aarch64-with-glibc2.35

CPU:
Architecture:                       aarch64
CPU op-mode(s):                     64-bit
Byte Order:                         Little Endian
CPU(s):                             192
On-line CPU(s) list:                0-191
Vendor ID:                          HiSilicon
BIOS Vendor ID:                     HiSilicon
Model name:                         Kunpeng-920
BIOS Model name:                    HUAWEI Kunpeng 920 5250
Model:                              0
Thread(s) per core:                 1
Core(s) per socket:                 48
Socket(s):                          4
Stepping:                           0x1
Frequency boost:                    disabled
CPU max MHz:                        2600.0000
CPU min MHz:                        200.0000
BogoMIPS:                           200.00
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache:                          12 MiB (192 instances)
L1i cache:                          12 MiB (192 instances)
L2 cache:                           96 MiB (192 instances)
L3 cache:                           192 MiB (8 instances)
NUMA node(s):                       8
NUMA node0 CPU(s):                  0-23
NUMA node1 CPU(s):                  24-47
NUMA node2 CPU(s):                  48-71
NUMA node3 CPU(s):                  72-95
NUMA node4 CPU(s):                  96-119
NUMA node5 CPU(s):                  120-143
NUMA node6 CPU(s):                  144-167
NUMA node7 CPU(s):                  168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization
Vulnerability Spectre v2:           Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.4.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.dev20250320
[pip3] torchvision==0.20.1
[pip3] transformers==4.51.3
[conda] Could not collect
ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1


NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc3                 Version: 24.1.rc3                                             |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B3               | OK            | 97.3        45                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          3371 / 65536         |
+===========================+===============+====================================================+
| 1     910B3               | OK            | 94.5        45                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          3371 / 65536         |
+===========================+===============+====================================================+
| 2     910B3               | OK            | 88.4        44                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          3370 / 65536         |
+===========================+===============+====================================================+
| 3     910B3               | OK            | 94.0        46                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          3369 / 65536         |
+===========================+===============+====================================================+
| 4     910B3               | OK            | 95.0        46                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 5     910B3               | OK            | 94.2        47                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 6     910B3               | OK            | 93.6        46                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 7     910B3               | OK            | 99.3        48                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 2                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 3                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 4                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 5                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 6                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 7                                                            |
+===========================+===============+====================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.0.0
innerversion=V100R001C20SPC001B251
compatible_version=[V100R001C15],[V100R001C17],[V100R001C18],[V100R001C19],[V100R001C20]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.0.0/aarch64-linux

🐛 Describe the bug

I am trying the PD seperation feature with model Qwen2-7B-Instruct and these commands, under vllm-ascend directory:

export PROMPT_DEVICE_ID=0,1
export DECODE_DEVICE_ID=2,3
python examples/offline_disaggregated_prefill_npu.py

Here are the logs and error messages:

INFO 04-28 05:12:09 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 04-28 05:12:09 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 04-28 05:12:09 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 04-28 05:12:09 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:09 [__init__.py:44] plugin ascend loaded.
INFO 04-28 05:12:09 [__init__.py:230] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 04-28 05:12:09 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 04-28 05:12:09 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 04-28 05:12:09 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 04-28 05:12:09 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:09 [__init__.py:44] plugin ascend loaded.
INFO 04-28 05:12:09 [__init__.py:230] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 04-28 05:12:11 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 04-28 05:12:11 [__init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 04-28 05:12:11 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 04-28 05:12:11 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:11 [__init__.py:44] plugin ascend_enhanced_model loaded.
INFO 04-28 05:12:11 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:11 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 04-28 05:12:11 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 04-28 05:12:11 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 04-28 05:12:11 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 04-28 05:12:11 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 04-28 05:12:11 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 04-28 05:12:11 [__init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 04-28 05:12:11 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 04-28 05:12:11 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 05:12:11 [__init__.py:44] plugin ascend_enhanced_model loaded.
INFO 04-28 05:12:11 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:12 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 04-28 05:12:12 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 04-28 05:12:12 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 04-28 05:12:12 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 04-28 05:12:12 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 04-28 05:12:22 [config.py:689] This model supports multiple tasks: {'score', 'generate', 'classify', 'embed', 'reward'}. Defaulting to 'generate'.
WARNING 04-28 05:12:22 [arg_utils.py:1731] --kv-transfer-config is not supported by the V1 Engine. Falling back to V0.
INFO 04-28 05:12:22 [config.py:1713] Defaulting to use mp for distributed inference
INFO 04-28 05:12:22 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
WARNING 04-28 05:12:22 [platform.py:129] NPU compilation support pending. Will be available in future CANN and torch_npu releases. Using default: enforce_eager=True
INFO 04-28 05:12:22 [platform.py:134] Compilation disabled, using eager mode by default
INFO 04-28 05:12:22 [llm_engine.py:243] Initializing a V0 LLM engine (v0.8.4) with config: model='/models/Qwen2-7B-Instruct', speculative_config=None, tokenizer='/models/Qwen2-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2000, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/models/Qwen2-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":0,"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
INFO 04-28 05:12:22 [config.py:689] This model supports multiple tasks: {'score', 'generate', 'classify', 'embed', 'reward'}. Defaulting to 'generate'.
WARNING 04-28 05:12:22 [arg_utils.py:1731] --kv-transfer-config is not supported by the V1 Engine. Falling back to V0.
INFO 04-28 05:12:22 [config.py:1713] Defaulting to use mp for distributed inference
INFO 04-28 05:12:22 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
WARNING 04-28 05:12:22 [platform.py:129] NPU compilation support pending. Will be available in future CANN and torch_npu releases. Using default: enforce_eager=True
INFO 04-28 05:12:22 [platform.py:134] Compilation disabled, using eager mode by default
INFO 04-28 05:12:22 [llm_engine.py:243] Initializing a V0 LLM engine (v0.8.4) with config: model='/models/Qwen2-7B-Instruct', speculative_config=None, tokenizer='/models/Qwen2-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2000, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/models/Qwen2-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":0,"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
WARNING 04-28 05:12:22 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 192 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:22 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks
ERROR 04-28 05:12:22 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:22 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
INFO 04-28 05:12:22 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:22 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 04-28 05:12:22 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:22 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
(VllmWorkerProcess pid=18200) WARNING 04-28 05:12:22 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:22 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
WARNING 04-28 05:12:23 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 192 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:23 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks
ERROR 04-28 05:12:23 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:23 [camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
INFO 04-28 05:12:23 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:23 [patch_tritonplaceholder.py:32] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 05:12:23 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
(VllmWorkerProcess pid=18268) WARNING 04-28 05:12:23 [patch_tritonplaceholder.py:43] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 04-28 05:12:23 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:23 [patch_tritonplaceholder.py:68] Triton module has been replaced with a placeholder.
WARNING 04-28 05:12:23 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda1904f0>
(VllmWorkerProcess pid=18200) WARNING 04-28 05:12:23 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda18c250>
(VllmWorkerProcess pid=18268) WARNING 04-28 05:12:24 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda18c1c0>
WARNING 04-28 05:12:24 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffdda190460>
[rank1]:[W428 05:12:30.777485493 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank0]:[W428 05:12:30.778197299 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 04-28 05:12:30 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_8211b83c'), local_subscribe_addr='ipc:///tmp/d3502128-3222-4794-a17f-185139b45d4e', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorkerProcess pid=18200) INFO 04-28 05:12:30 [parallel_state.py:959] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
[rank1]:[W428 05:12:30.931797872 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank0]:[W428 05:12:30.938525886 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 04-28 05:12:30 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_00d5acb5'), local_subscribe_addr='ipc:///tmp/31ca084d-c176-4202-a16c-0a04f3b96f29', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorkerProcess pid=18268) INFO 04-28 05:12:30 [parallel_state.py:959] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
INFO 04-28 05:12:30 [parallel_state.py:959] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 04-28 05:12:30 [parallel_state.py:959] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
decode: local rank 0
options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '0'}
(VllmWorkerProcess pid=18200) decode: local rank 1
(VllmWorkerProcess pid=18200) options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1'}
(VllmWorkerProcess pid=18268) prompt: local rank 1, ip = 192.168.128.92
(VllmWorkerProcess pid=18268) options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.listenIpInfo': '192.168.128.92:26000'}
prompt: local rank 0, ip = 192.168.128.91
options: {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '0', 'llm.listenIpInfo': '192.168.128.91:26000'}
INFO 04-28 05:12:41 [llmdatadist_connector.py:118] 0/2 rank data dist is ready
target ip = 192.168.128.91
INFO 04-28 05:12:41 [llmdatadist_connector.py:118] 0/2 rank data dist is ready
INFO 04-28 05:12:41 [llmdatadist_connector.py:169] local_rank 0 link, ret=[<LLMStatusCode.LLM_SUCCESS: 0>]
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 2378, in run_method
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     return func(*args, **kwargs)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 604, in init_device
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.worker.init_device()  # type: ignore
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 192, in init_device
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self._init_worker_distributed_environment(self.vllm_config, self.rank,
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 521, in _init_worker_distributed_environment
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     ensure_kv_transfer_initialized(vllm_config)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 979, in ensure_kv_transfer_initialized
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     _KV_TRANSFER = kv_transfer.KVTransferAgent(
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_transfer_agent.py", line 49, in __init__
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.connector = KVConnectorFactory.create_connector(
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 36, in create_connector
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     return connector_cls(rank, local_rank, config)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 162, in __init__
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.llm_datadist_engine.prepare_data_dist()
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 116, in prepare_data_dist
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.data_dist.init(options)
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/v2/llm_datadist.py", line 284, in init
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     handle_llm_status(ret, '[LLMDataDist.init]', f'Failed to initialize llm datadist, options = {options}')
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/status.py", line 121, in handle_llm_status
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     raise LLMException(f"{func_name} failed, error code is {code_2_status(status)}, {other_info}.",
(VllmWorkerProcess pid=18200) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.Role': 'Decoder', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_decoder_1.json'}.
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 2378, in run_method
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     return func(*args, **kwargs)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 604, in init_device
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.worker.init_device()  # type: ignore
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 192, in init_device
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self._init_worker_distributed_environment(self.vllm_config, self.rank,
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 521, in _init_worker_distributed_environment
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     ensure_kv_transfer_initialized(vllm_config)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 979, in ensure_kv_transfer_initialized
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     _KV_TRANSFER = kv_transfer.KVTransferAgent(
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_transfer_agent.py", line 49, in __init__
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.connector = KVConnectorFactory.create_connector(
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 36, in create_connector
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     return connector_cls(rank, local_rank, config)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 160, in __init__
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.llm_datadist_engine.prepare_data_dist()
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/workspace/vllm-ascend/vllm_ascend/distributed/llmdatadist_connector.py", line 116, in prepare_data_dist
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     self.data_dist.init(options)
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/v2/llm_datadist.py", line 284, in init
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     handle_llm_status(ret, '[LLMDataDist.init]', f'Failed to initialize llm datadist, options = {options}')
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]   File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/llm_datadist/status.py", line 121, in handle_llm_status
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238]     raise LLMException(f"{func_name} failed, error code is {code_2_status(status)}, {other_info}.",
(VllmWorkerProcess pid=18268) ERROR 04-28 05:12:44 [multiproc_worker_utils.py:238] llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.listenIpInfo': '192.168.128.92:26000', 'llm.Role': 'Prompt', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"], "listen_ip_info": [{"ip": 1551935680, "port": 26000}]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_prompt_1.json'}.
INFO 04-28 05:12:44 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Process Process-2:
    self.run()
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 84, in run
    for result in iter(self.result_queue.get, _TERMINATE):
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/queues.py", line 103, in get
    res = self._recv_bytes()
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/connection.py", line 421, in _recv_bytes
    return self._recv(size)
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
TypeError: 'NoneType' object cannot be interpreted as an integer
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/vllm-ascend/examples/offline_disaggregated_prefill_npu.py", line 93, in run_decode
    llm = LLM(model="/models/Qwen2-7B-Instruct",
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 1099, in inner
    return fn(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 248, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 522, in from_engine_args
    return engine_cls.from_vllm_config(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 498, in from_vllm_config
    return cls(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 282, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor
    self._run_workers("init_device")
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in _run_workers
    ] + [output.get() for output in worker_outputs]
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in <listcomp>
    ] + [output.get() for output in worker_outputs]
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 58, in get
    raise self.result.exception
llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.Role': 'Decoder', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_decoder_1.json'}.
INFO 04-28 05:12:44 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/python3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/vllm-ascend/examples/offline_disaggregated_prefill_npu.py", line 54, in run_prefill
    llm = LLM(model="/models/Qwen2-7B-Instruct",
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/utils.py", line 1099, in inner
    return fn(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 248, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 522, in from_engine_args
    return engine_cls.from_vllm_config(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 498, in from_vllm_config
    return cls(
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 282, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 124, in _init_executor
    self._run_workers("init_device")
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in _run_workers
    ] + [output.get() for output in worker_outputs]
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 190, in <listcomp>
    ] + [output.get() for output in worker_outputs]
  File "/usr/local/python3.10/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 58, in get
    raise self.result.exception
llm_datadist.status.LLMException: [LLMDataDist.init] failed, error code is LLMStatusCode.LLM_FAILED, Failed to initialize llm datadist, options = {'llm.SyncKvCacheWaitTime': '5000', 'ge.exec.deviceId': '1', 'llm.listenIpInfo': '192.168.128.92:26000', 'llm.Role': 'Prompt', 'llm.ClusterInfo': '{"cluster_id": 1, "logic_device_id": ["0:0:0:0"], "listen_ip_info": [{"ip": 1551935680, "port": 26000}]}', 'ge.resourceConfigPath': '/workspace/vllm-ascend/stub_numa_config_prompt_1.json'}.
/usr/local/python3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
All process done!

It seems that DataDist cannot be initialized. But the options of initializing DataDist seems correct.
Here are the contents of the most recent log file under /root/ascend/log/debug/plog/:

[ERROR] GE(10280,python):2025-04-28-04:18:56.100.107 [subprocess_manager.cc:181]11592 operator(): ErrorNo: 4294967295(failed) Unsupported process type: queue_schedule
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.139 [flowgw_client.cc:69]11639 InnerStartFlowGw: ErrorNo: 4294967295(failed) Failed to fork flowgw.
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.146 [flowgw_client.cc:178]11639 StartFlowGw: ErrorNo: 4294967295(failed) Failed to start flowgw, device_id = 0, memory group = DM_QS_GROUP_10280.
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.151 [flowgw_client.cc:185]11639 Initialize: ErrorNo: 4294967295(failed) Failed to start flowgw client, device_id = 0, is_proxy = 0.
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.156 [flowgw_client_manager.cc:44]11639 GetOrCreateClient: ErrorNo: 4294967295(failed) Failed to initialize flowgw client.
[ERROR] GE(10280,python):2025-04-28-04:18:56.100.235 [deploy_context.cc:1291]11639 InitProcessResource: ErrorNo: 4294967295(failed) [Check][Param:client]null is invalid
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.889 [deployer_service_impl.cc:170]11639 Process: ErrorNo: 4294967295(failed) [Process][Request] failed, client = local_context_10280, type = kInitProcessResource, ret = 4294967295
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.905 [deployer_proxy.cc:104]11639 SendRequest: ErrorNo: 4294967295(failed) Failed to process request, node_id = 0, request type = 9, error code = 4294967295, node_type = local, client_id = -1.
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.946 [master_model_deployer.cc:580]11639 InitProcessResourceRequest: ErrorNo: 4294967295(failed) [InitProcessResource] failed, node id=0, error code=4294967295, error message=Failed to deploy rank table..
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.955 [master_model_deployer.cc:240]11639 operator(): ErrorNo: 4294967295(failed) Failed to deploy rank table
[ERROR] GE(10280,python):2025-04-28-04:18:56.101.978 [master_model_deployer.cc:246]11612 InitFlowGwInfo: ErrorNo: 4294967295(failed) [LOAD][LOAD]Failed to deploy rank table.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.301 [master_model_deployer.cc:409]11612 DeployModel: ErrorNo: 4294967295(failed) [LOAD][LOAD]
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.335 [model_executor.cc:1182]11612 FlowModelLoad: ErrorNo: 4294967295(failed) [LOAD][LOAD]Failed to load model: LlmGraph_1
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.342 [model_executor.cc:695]11612 ModelLoad: ErrorNo: 4294967295(failed) [LOAD][LOAD][Load][ModelOnline] Failed, model_id:4294967295
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.350 [graph_manager.cc:1150]11612 StartForRunGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD][Load][Graph] Failed, graph_id:0.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.355 [graph_manager.cc:1734]11612 BuildGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD][Call][StartForRunGraph] failed! graph_id:0.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.360 [inner_session.cc:832]11612 BuildGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD][Build][Graph] failed, InnerSession:1 graph_id=0.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.466 [ge_api.cc:1012]11612 BuildGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD]Build graph failed, error code:4294967295, session_id:1, graph_id:0.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.475 [flow_graph_manager.cc:915]11612 BuildFlowGraph: ErrorNo: 4294967295(failed) [LOAD][LOAD][PromptManager] BuildForFlowGraph failed.
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.485 [llm_flow_service.cc:64]11612 LoadFlowFuncs: ErrorNo: 4294967295(failed) [LOAD][LOAD]
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.501 [llm_flow_service.cc:263]11612 operator(): ErrorNo: 4294967295(failed) [LOAD][LOAD]
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.535 [llm_flow_service.cc:281]10280 GetAndCheckFutures: ErrorNo: 4294967295(failed) [INIT][DEFAULT]execute failed, device index = 0
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.546 [llm_flow_service.cc:288]10280 GetAndCheckFutures: ErrorNo: 4294967295(failed) [INIT][DEFAULT]Failed to execute in all devices, success/total = 0/1
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.551 [llm_flow_service.cc:268]10280 LoadDataFlow: ErrorNo: 4294967295(failed) [INIT][DEFAULT]Failed to load data flow in all devices
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.686 [llm_flow_service.cc:175]10280 Initialize: ErrorNo: 4294967295(failed) [INIT][DEFAULT]
[ERROR] GE(10280,python):2025-04-28-04:19:02.844.699 [llm_datadist.cc:244]10280 InitFlowService: ErrorNo: 4294967295(failed) [INIT][DEFAULT]Failed to initialize LLM flow service
[ERROR] GE(10280,python):2025-04-28-04:19:04.874.662 [llm_datadist.cc:100]10280 LLMDataDistInitialize: ErrorNo: 4294967295(failed) [INIT][DEFAULT][Manager] initialize failed.
[ERROR] GE(10280,python):2025-04-28-04:19:04.874.674 [llm_datadist_wrapper.cc:156]10280 Init: ErrorNo: 4294967295(failed) [INIT][DEFAULT]

Could you please help us find out what is the problem? Or is there a bug in the PD seperation feature? Thank you.

@gudiandian gudiandian added the bug Something isn't working label Apr 28, 2025
@jianzs
Copy link
Collaborator

jianzs commented Apr 28, 2025

As far as I know, the disaggregated prefill is currently not functioning. You can check the ongoing work on this issue; for v0, refer to #694 and #658, and for v1, refer to #684.

@Yikun
Copy link
Collaborator

Yikun commented May 16, 2025

We track P/D Disaggregation Support here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants