Update fp8 paged attention for MI308 #592

amd-xiaoyu12 · 2025-07-09T16:57:17Z

Please direct your PRs to the upstream vllm (https://github.com/vllm-project/vllm.git)

Accepting PRs into the ROCm fork (https://github.com/ROCm/vllm) will require a clear previously communicated exception

Summary:
Support full fp8 MFMA with wrap level dynamic query quantization to improve fp8 performance on MI308, which can also benefits other MI300x accelerator or latest hardware.

Performance

Unit test - attention output

* Lm-eval-harness ppl test

Add new fp8 conversion

024a2b3

amd-xiaoyu12 changed the title ~~Update fp8 paged attention~~ Update fp8 paged attention for MI308 Jul 9, 2025

Xiao YU added 17 commits July 10, 2025 16:44

Support full fp8 mfma

0860379

Add new mfma fp8 function

f9736d4

Support wrap level dynamic q-scale

556b922

Update max q-scale

f94b6df

Clean up code

15780b7

Update CMakeList.txt for fp8 instructions support

03ca842

Add reinterpret_cast

332a979

Add reinterpret_cast

87005a8

Add verification test

69811c8

Update test

cbd78e7

Update fp8 q scale code

1368327

Full fp8 mfma for PV calculation

020a7f4

Fix Q index error

c6c12ba

Update test

66a5c1f

Update test

418752b

Update fp8 PV index

1775084

Update unit test fp16 vs fp8 output verify logic

b8add97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update fp8 paged attention for MI308 #592

Update fp8 paged attention for MI308 #592

amd-xiaoyu12 commented Jul 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Update fp8 paged attention for MI308 #592

Are you sure you want to change the base?

Update fp8 paged attention for MI308 #592

Conversation

amd-xiaoyu12 commented Jul 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

amd-xiaoyu12 commented Jul 9, 2025 •

edited by github-actions bot

Loading