removing quant and kv-cache fp8 from deepseek run instructions #509

arakowsk-amd · 2025-04-09T20:18:37Z

No description provided.

shajrawi

Please add description of why you are proposing this

shajrawi · 2025-04-09T20:26:39Z

docs/dev-docker/README.md

@@ -377,7 +377,7 @@ python3 /app/vllm/benchmarks/benchmark_serving.py \
 # Offline throughput 
 python3 /app/vllm/benchmarks/benchmark_throughput.py --model deepseek-ai/DeepSeek-V3 \
    --input-len <> --output-len <> --tensor-parallel-size 8 \
-    --quantization fp8 --kv-cache-dtype fp8 --dtype float16 \
+    --dtype float16 \


Can you specify why?

Raises error an error:

export VLLM_MLA_DISABLE=0 export VLLM_USE_AITER=1 export VLLM_USE_TRITON_FLASH_ATTN=1 python3 /app/vllm/benchmarks/benchmark_throughput.py --model /data/DeepSeek-R1/ --input-len 128 --output-len 128 --tensor-parallel-size 8 --quantization fp8 --kv-cache-dtype fp8 --dtype bfloat16 --max-model-len 32768 --block-size=1 --trust-remote-code [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/attention/backends/triton_mla.py", line 63, in __init__ [rank0]: raise NotImplementedError( [rank0]: NotImplementedError: TritonMLA with FP8 KV cache not yet supported

Why is Triton MLA being used with AITER? cc @qli88

@arakowsk-amd are you using the latest version? If you'd like we can discuss through Teams.

github-actions · 2025-07-11T02:12:32Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

removing quant and kv-cache fp8

446ac74

arakowsk-amd requested review from shajrawi, gshtras, maleksan85, sunway513 and hongxiayang as code owners April 9, 2025 20:18

shajrawi approved these changes Apr 9, 2025

View reviewed changes

shajrawi requested changes Apr 9, 2025

View reviewed changes

Update README.md

68a07cc

github-actions bot added the stale label Jul 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

removing quant and kv-cache fp8 from deepseek run instructions #509

removing quant and kv-cache fp8 from deepseek run instructions #509

Uh oh!

arakowsk-amd commented Apr 9, 2025

Uh oh!

shajrawi left a comment

Uh oh!

shajrawi Apr 9, 2025

Uh oh!

arakowsk-amd Apr 9, 2025

Uh oh!

shajrawi Apr 9, 2025

Uh oh!

qli88 Apr 9, 2025

Uh oh!

github-actions bot commented Jul 11, 2025

Uh oh!

Uh oh!

removing quant and kv-cache fp8 from deepseek run instructions #509

Are you sure you want to change the base?

removing quant and kv-cache fp8 from deepseek run instructions #509

Uh oh!

Conversation

arakowsk-amd commented Apr 9, 2025

Uh oh!

shajrawi left a comment

Choose a reason for hiding this comment

Uh oh!

shajrawi Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

arakowsk-amd Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

shajrawi Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

qli88 Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 11, 2025

Uh oh!

Uh oh!