Description
The model to consider.
https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite
The closest model vllm already supports.
https://huggingface.co/deepseek-ai/DeepSeek-V3
What's your difficulty of supporting the model you want?
The operators of the torch_npu do not support this model. The ERRORs as the follows:
[rank0]: During handling of the above exception, another exception occurred:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/libingnan/project/test_vllm/test_vllm_ascend_deepseek_v2.py", line 36, in
[rank0]: outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
[rank0]: File "/vllm-workspace/vllm-main/vllm/utils.py", line 1212, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm-main/vllm/entrypoints/llm.py", line 479, in generate
[rank0]: outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]: File "/vllm-workspace/vllm-main/vllm/entrypoints/llm.py", line 1464, in _run_engine
[rank0]: step_outputs = self.llm_engine.step()
[rank0]: File "/vllm-workspace/vllm-main/vllm/engine/llm_engine.py", line 1405, in step
[rank0]: outputs = self.model_executor.execute_model(
[rank0]: File "/vllm-workspace/vllm-main/vllm/executor/executor_base.py", line 299, in execute_model
[rank0]: driver_outputs = self._driver_execute_model(execute_model_req)
[rank0]: File "/vllm-workspace/vllm-main/vllm/executor/mp_distributed_executor.py", line 144, in _driver_execute_model
[rank0]: return self.driver_worker.execute_model(execute_model_req)
[rank0]: File "/vllm-workspace/vllm-main/vllm/worker/worker_base.py", line 420, in execute_model
[rank0]: output = self.model_runner.execute_model(
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm-ascend-main/vllm_ascend/worker/model_runner.py", line 1402, in execute_model
[rank0]: hidden_or_intermediate_states = model_executable(
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm-ascend-main/vllm_ascend/models/deepseek_v2.py", line 670, in forward
[rank0]: def forward(
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1100, in forward
[rank0]: return compiled_fn(full_args)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 321, in runtime_wrapper
[rank0]: all_outs = call_func_at_runtime_with_args(
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 124, in call_func_at_runtime_with_args
[rank0]: out = normalize_as_list(f(args))
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 667, in inner_fn
[rank0]: outs = compiled_fn(args)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 488, in wrapper
[rank0]: return compiled_fn(runtime_args)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 98, in g
[rank0]: return f(*args)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/dynamo/torchair/npu_fx_compiler.py", line 260, in call
[rank0]: gm_result = self.runner(*args, **kwargs)
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/dynamo/torchair/_ge_concrete_graph/fx2ge_converter.py", line 537, in call
[rank0]: self.compile()
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/dynamo/torchair/_ge_concrete_graph/fx2ge_converter.py", line 627, in compile
[rank0]: self.graph.compile()
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/dynamo/torchair/ge/_ge_graph.py", line 657, in compile
[rank0]: self._executor.compile()
[rank0]: File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/dynamo/torchair/utils/error_code.py", line 46, in wapper
[rank0]: raise type(e)("\n".join(msg))
[rank0]: RuntimeError: EZ9999: Inner Error!
[rank0]: EZ9999: [PID: 62938] 2025-05-27-06:52:12.455.807 numHeads / numKvHeads = 8, MLA only support {32, 64, 128}.[FUNC:CheckMlaAttrs][FILE:incre_flash_attention_tiling_check.cc][LINE:1218]
[rank0]: TraceBack (most recent call last):
[rank0]: Assert (((funcs->tiling)(reinterpret_cast<gert::TilingContext *>(tiling_context_holder.context))) == ge::GRAPH_SUCCESS) failed[FUNC:RtParseAndTiling][FILE:op_tiling_rt2.cc][LINE:526]
[rank0]: [GenTask][CalcExtOpRunningParam] CalcTilingSinkRunningParam failed.[FUNC:CalcExtOpRunningParam][FILE:aicore_ops_kernel_builder.cc][LINE:259]
[rank0]: [GenTask][CalcOpRunningParam] CalcExtOpRunningParam failed.[FUNC:CalcOpRunningParam][FILE:aicore_ops_kernel_builder.cc][LINE:227]
[rank0]: Call Calculate op:FusedInferAttentionScore(FusedInferAttentionScore) running param failed[FUNC:CalcOpParam][FILE:graph_builder.cc][LINE:211]
[rank0]: [Call][PreRun] Failed, graph_id:0, session_id:0.[FUNC:CompileGraph][FILE:graph_manager.cc][LINE:4545]
[rank0]: [Compile][Graph]Compile graph failed, error code:1343225857, session_id:0, graph_id:0.[FUNC:CompileGraph][FILE:ge_api.cc][LINE:1280]