-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Description
Your current environment
vllm :0.10.1.1
transformers :4.56.1
启动命令
export CUDA_VISIBLE_DEVICES=0,1,2,3 && nohup vllm serve Qwen3-32B-AWQ
--tensor-parallel-size 4
--dtype half
--cpu-offload-gb 1
--gpu-memory-utilization 0.9
--host 0.0.0.0
--enable-auto-tool-choice
--tool-call-parser hermes
--port 10000 > qwen3-32B.log 2>&1 & disown && tail -fn 200 qwen3-32B.log
🐛 Describe the bug
当模型启动成功进行对话时,报错
error: Failures have been detected while processing an MLIR pass pipeline
/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/ops/prefix_prefill.py:36:0: note: Pipeline failed while executing [ConvertTritonGPUToLLVM
on 'builtin.module' operation]: reproducer generated at std::errs, please share the reproducer above with Triton project.
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] Error in chat completion stream generator.
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] Traceback (most recent call last):
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 545, in chat_completion_stream_generator
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] async for res in result_generator:
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/client.py", line 583, in _process_request
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] raise request_output
(APIServer pid=3176541) ERROR 09-16 10:40:29 [serving_chat.py:1051] vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: RuntimeError('PassManager::run failed').
ERROR 09-16 10:40:29 [engine.py:176] RuntimeError('PassManager::run failed')
ERROR 09-16 10:40:29 [engine.py:176] Traceback (most recent call last):
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 174, in start
ERROR 09-16 10:40:29 [engine.py:176] self.run_engine_loop()
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 237, in run_engine_loop
ERROR 09-16 10:40:29 [engine.py:176] request_outputs = self.engine_step()
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 263, in engine_step
ERROR 09-16 10:40:29 [engine.py:176] raise e
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 246, in engine_step
ERROR 09-16 10:40:29 [engine.py:176] return self.engine.step()
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 1221, in step
ERROR 09-16 10:40:29 [engine.py:176] outputs = self.model_executor.execute_model(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 277, in execute_model
ERROR 09-16 10:40:29 [engine.py:176] driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 145, in _driver_execute_model
ERROR 09-16 10:40:29 [engine.py:176] return self.driver_worker.execute_model(execute_model_req)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 417, in execute_model
ERROR 09-16 10:40:29 [engine.py:176] output = self.model_runner.execute_model(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 09-16 10:40:29 [engine.py:176] return func(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1701, in execute_model
ERROR 09-16 10:40:29 [engine.py:176] hidden_or_intermediate_states = model_executable(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 324, in forward
ERROR 09-16 10:40:29 [engine.py:176] hidden_states = self.model(input_ids, positions, intermediate_tensors,
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 206, in call
ERROR 09-16 10:40:29 [engine.py:176] return self.forward(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 361, in forward
ERROR 09-16 10:40:29 [engine.py:176] hidden_states, residual = layer(positions, hidden_states, residual)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 614, in forward
ERROR 09-16 10:40:29 [engine.py:176] output = functional_call(module,
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/_functorch/functional_call.py", line 148, in functional_call
ERROR 09-16 10:40:29 [engine.py:176] return nn.utils.stateless._functional_call(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/utils/stateless.py", line 282, in _functional_call
ERROR 09-16 10:40:29 [engine.py:176] return module(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 230, in forward
ERROR 09-16 10:40:29 [engine.py:176] hidden_states = self.self_attn(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/model_executor/models/qwen3.py", line 157, in forward
ERROR 09-16 10:40:29 [engine.py:176] attn_output = self.attn(q, k, v)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 09-16 10:40:29 [engine.py:176] return self._call_impl(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 09-16 10:40:29 [engine.py:176] return forward_call(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/layer.py", line 287, in forward
ERROR 09-16 10:40:29 [engine.py:176] return torch.ops.vllm.unified_attention(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/_ops.py", line 1158, in call
ERROR 09-16 10:40:29 [engine.py:176] return self._op(*args, **(kwargs or {}))
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/layer.py", line 459, in unified_attention
ERROR 09-16 10:40:29 [engine.py:176] output = self.impl.forward(self, query, key, value, kv_cache,
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/backends/xformers.py", line 584, in forward
ERROR 09-16 10:40:29 [engine.py:176] out = PagedAttention.forward_prefix(
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/ops/paged_attn.py", line 214, in forward_prefix
ERROR 09-16 10:40:29 [engine.py:176] context_attention_fwd(
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 09-16 10:40:29 [engine.py:176] return func(*args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/attention/ops/prefix_prefill.py", line 862, in context_attention_fwd
ERROR 09-16 10:40:29 [engine.py:176] _fwd_kernel[grid](
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in
ERROR 09-16 10:40:29 [engine.py:176] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/runtime/jit.py", line 569, in run
ERROR 09-16 10:40:29 [engine.py:176] kernel = self.compile(src, target=target, options=options.dict)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/compiler/compiler.py", line 284, in compile
ERROR 09-16 10:40:29 [engine.py:176] next_module = compile_ir(module, metadata)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 450, in
ERROR 09-16 10:40:29 [engine.py:176] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, capability)
ERROR 09-16 10:40:29 [engine.py:176] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-16 10:40:29 [engine.py:176] File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 341, in make_llir
ERROR 09-16 10:40:29 [engine.py:176] pm.run(mod)
ERROR 09-16 10:40:29 [engine.py:176] RuntimeError: PassManager::run failed
ERROR 09-16 10:40:31 [multiproc_worker_utils.py:116] Worker VllmWorkerProcess pid 3177235 died, exit code: -15
INFO 09-16 10:40:31 [multiproc_worker_utils.py:120] Killing local vLLM worker processes
(APIServer pid=3176541) INFO: Shutting down
(APIServer pid=3176541) INFO: Waiting for application shutdown.
(APIServer pid=3176541) INFO: Application shutdown complete.
(APIServer pid=3176541) INFO: Finished server process [3176541]
Process SpawnProcess-1:
Traceback (most recent call last):
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 317, in _bootstrap
util._exit_function()
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/util.py", line 360, in _exit_function
p.join()
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/popen_fork.py", line 43, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 445, in signal_handler
raise KeyboardInterrupt("MQLLMEngine terminated")
KeyboardInterrupt: MQLLMEngine terminated
[rank0]:[W916 10:40:34.350294435 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda3/envs/vllm/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.