Skip to content

[Bug]: vllm启动qwen3-32B-AWQ后进行对话报错 #24959

@hackerHiJu

Description

@hackerHiJu

Your current environment

vllm :0.10.1.1
transformers :4.56.1

启动命令
export CUDA_VISIBLE_DEVICES=0,1,2,3 && nohup vllm serve Qwen3-32B-AWQ
--tensor-parallel-size 4
--dtype half
--cpu-offload-gb 1
--gpu-memory-utilization 0.9
--host 0.0.0.0
--enable-auto-tool-choice
--tool-call-parser hermes
--port 10000 > qwen3-32B.log 2>&1 & disown && tail -fn 200 qwen3-32B.log

🐛 Describe the bug

当模型启动成功进行对话时,报错

error: Failures have been detected while processing an MLIR pass pipeline
/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/ops/prefix_prefill.py:36:0: note: Pipeline failed while executing [ConvertTritonGPUToLLVM on 'builtin.module' operation]: reproducer generated at std::errs, please share the reproducer above with Triton project.
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] Error in chat completion stream generator.
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] Traceback (most recent call last):
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 545, in chat_completion_stream_generator
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] async for res in result_generator:
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/client.py", line 583, in _process_request
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] raise request_output
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: RuntimeError('PassManager::run failed').
ERROR 09-16 10:40:29 [engine.py:176] RuntimeError('PassManager::run failed')
ERROR 09-16 10:40:29 [engine.py:176] Traceback (most recent call last):
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 174, in start
ERROR 09-16 10:40:29 [engine.py:176] self.run_engine_loop()
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 237, in run_engine_loop
ERROR 09-16 10:40:29 [engine.py:176] request_outputs = self.engine_step()
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 263, in engine_step
ERROR 09-16 10:40:29 [engine.py:176] raise e
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 246, in engine_step
ERROR 09-16 10:40:29 [engine.py:176] return self.engine.step()
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 1221, in step
ERROR 09-16 10:40:29 [engine.py:176] outputs = self.model_executor.execute_model(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 277, in execute_model
ERROR 09-16 10:40:29 [engine.py:176] driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 145, in _driver_execute_model
ERROR 09-16 10:40:29 [engine.py:176] return self.driver_worker.execute_model(execute_model_req)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 417, in execute_model
ERROR 09-16 10:40:29 [engine.py:176] output = self.model_runner.execute_model(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 09-16 10:40:29 [engine.py:176] return func(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1701, in execute_model
ERROR 09-16 10:40:29 [engine.py:176] hidden_or_intermediate_states = model_executable(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 324, in forward
ERROR 09-16 10:40:29 [engine.py:176] hidden_states = self.model(input_ids, positions, intermediate_tensors,
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 206, in call
ERROR 09-16 10:40:29 [engine.py:176] return self.forward(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 361, in forward
ERROR 09-16 10:40:29 [engine.py:176] hidden_states, residual = layer(positions, hidden_states, residual)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 614, in forward
ERROR 09-16 10:40:29 [engine.py:176] output = functional_call(module,
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/_functorch/functional_call.py", line 148, in functional_call
ERROR 09-16 10:40:29 [engine.py:176] return nn.utils.stateless._functional_call(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/utils/stateless.py", line 282, in _functional_call
ERROR 09-16 10:40:29 [engine.py:176] return module(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 230, in forward
ERROR 09-16 10:40:29 [engine.py:176] hidden_states = self.self_attn(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 157, in forward
ERROR 09-16 10:40:29 [engine.py:176] attn_output = self.attn(q, k, v)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/layer.py", line 287, in forward
ERROR 09-16 10:40:29 [engine.py:176] return torch.ops.vllm.unified_attention(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/_ops.py", line 1158, in call
ERROR 09-16 10:40:29 [engine.py:176] return self._op(*args, **(kwargs or {}))
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/layer.py", line 459, in unified_attention
ERROR 09-16 10:40:29 [engine.py:176] output = self.impl.forward(self, query, key, value, kv_cache,
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/backends/xformers.py", line 584, in forward
ERROR 09-16 10:40:29 [engine.py:176] out = PagedAttention.forward_prefix(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/ops/paged_attn.py", line 214, in forward_prefix
ERROR 09-16 10:40:29 [engine.py:176] context_attention_fwd(
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 09-16 10:40:29 [engine.py:176] return func(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/ops/prefix_prefill.py", line 862, in context_attention_fwd
ERROR 09-16 10:40:29 [engine.py:176] _fwd_kernel[grid](
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in
ERROR 09-16 10:40:29 [engine.py:176] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/runtime/jit.py", line 569, in run
ERROR 09-16 10:40:29 [engine.py:176] kernel = self.compile(src, target=target, options=options.dict)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/compiler/compiler.py", line 284, in compile
ERROR 09-16 10:40:29 [engine.py:176] next_module = compile_ir(module, metadata)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 450, in
ERROR 09-16 10:40:29 [engine.py:176] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, capability)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 341, in make_llir
ERROR 09-16 10:40:29 [engine.py:176] pm.run(mod)
ERROR 09-16 10:40:29 [engine.py:176] RuntimeError: PassManager::run failed
ERROR 09-16 10:40:31 [multiproc_worker_utils.py:116] Worker VllmWorkerProcess pid 3177235 died, exit code: -15
INFO 09-16 10:40:31 [multiproc_worker_utils.py:120] Killing local vLLM worker processes
(APIServer pid=3176541) INFO: Shutting down
(APIServer pid=3176541) INFO: Waiting for application shutdown.
(APIServer pid=3176541) INFO: Application shutdown complete.
(APIServer pid=3176541) INFO: Finished server process [3176541]
Process SpawnProcess-1:
Traceback (most recent call last):
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 317, in _bootstrap
util._exit_function()
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/util.py", line 360, in _exit_function
p.join()
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/popen_fork.py", line 43, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 445, in signal_handler
raise KeyboardInterrupt("MQLLMEngine terminated")
KeyboardInterrupt: MQLLMEngine terminated
[rank0]:[W916 10:40:34.350294435 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions