Skip to content

Commit c43debd

Browse files
authored
Updated README.md with April 10 results (#512)
* Updated README.md with April 10 results * Updated README.md with "2-stage MoE and MLA from AITER"
1 parent b8498bc commit c43debd

File tree

1 file changed

+45
-41
lines changed

1 file changed

+45
-41
lines changed

docs/dev-docker/README.md

Lines changed: 45 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -37,14 +37,14 @@ The table below shows performance data where a local inference client is fed req
3737

3838
| Model | Precision | TP Size | Input | Output | Num Prompts | Max Num Seqs | Throughput (tokens/s) |
3939
|-------|-----------|---------|-------|--------|-------------|--------------|-----------------------|
40-
| Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 128 | 2048 | 3200 | 3200 | 15684.7 |
41-
| | | | 128 | 4096 | 1500 | 1500 | 11761.5 |
42-
| | | | 500 | 2000 | 2000 | 2000 | 12895.9 |
43-
| | | | 2048 | 2048 | 1500 | 1500 | 8380.7 |
44-
| Llama 3.1 405B (amd/Llama-3.1-405B-Instruct-FP8-KV) | FP8 | 8 | 128 | 2048 | 1500 | 1500 | 4218.6 |
45-
| | | | 128 | 4096 | 1500 | 1500 | 3326.2 |
46-
| | | | 500 | 2000 | 2000 | 2000 | 3113.4 |
47-
| | | | 2048 | 2048 | 500 | 500 | 2112.1 |
40+
| Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 128 | 2048 | 3200 | 3200 | 16364.9 |
41+
| | | | 128 | 4096 | 1500 | 1500 | 12171.0 |
42+
| | | | 500 | 2000 | 2000 | 2000 | 13290.4 |
43+
| | | | 2048 | 2048 | 1500 | 1500 | 8216.5 |
44+
| Llama 3.1 405B (amd/Llama-3.1-405B-Instruct-FP8-KV) | FP8 | 8 | 128 | 2048 | 1500 | 1500 | 4331.6 |
45+
| | | | 128 | 4096 | 1500 | 1500 | 3409.9 |
46+
| | | | 500 | 2000 | 2000 | 2000 | 3184.0 |
47+
| | | | 2048 | 2048 | 500 | 500 | 2154.3 |
4848

4949
*TP stands for Tensor Parallelism.*
5050

@@ -54,38 +54,38 @@ The table below shows latency measurement, which typically involves assessing th
5454

5555
| Model | Precision | TP Size | Batch Size | Input | Output | MI300X Latency (sec) |
5656
|-------|-----------|----------|------------|--------|---------|-------------------|
57-
| Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 1 | 128 | 2048 | 17.662 |
58-
| | | | 2 | 128 | 2048 | 18.768 |
59-
| | | | 4 | 128 | 2048 | 19.282 |
60-
| | | | 8 | 128 | 2048 | 20.943 |
61-
| | | | 16 | 128 | 2048 | 23.388 |
62-
| | | | 32 | 128 | 2048 | 26.272 |
63-
| | | | 64 | 128 | 2048 | 34.514 |
64-
| | | | 128 | 128 | 2048 | 50.134 |
65-
| | | | 1 | 2048 | 2048 | 17.891 |
66-
| | | | 2 | 2048 | 2048 | 19.064 |
67-
| | | | 4 | 2048 | 2048 | 19.819 |
68-
| | | | 8 | 2048 | 2048 | 21.925 |
69-
| | | | 16 | 2048 | 2048 | 25.118 |
70-
| | | | 32 | 2048 | 2048 | 29.640 |
71-
| | | | 64 | 2048 | 2048 | 41.029 |
72-
| | | | 128 | 2048 | 2048 | 63.717 |
73-
| Llama 3.1 405B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 1 | 128 | 2048 | 46.779 |
74-
| | | | 2 | 128 | 2048 | 47.136 |
75-
| | | | 4 | 128 | 2048 | 49.045 |
76-
| | | | 8 | 128 | 2048 | 53.145 |
77-
| | | | 16 | 128 | 2048 | 55.720 |
78-
| | | | 32 | 128 | 2048 | 64.996 |
79-
| | | | 64 | 128 | 2048 | 81.950 |
80-
| | | | 128 | 128 | 2048 | 114.799 |
81-
| | | | 1 | 2048 | 2048 | 47.448 |
82-
| | | | 2 | 2048 | 2048 | 47.764 |
83-
| | | | 4 | 2048 | 2048 | 51.338 |
84-
| | | | 8 | 2048 | 2048 | 56.915 |
85-
| | | | 16 | 2048 | 2048 | 61.934 |
86-
| | | | 32 | 2048 | 2048 | 76.136 |
87-
| | | | 64 | 2048 | 2048 | 104.868 |
88-
| | | | 128 | 2048 | 2048 | 159.555 |
57+
| Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 1 | 128 | 2048 | 17.411 |
58+
| | | | 2 | 128 | 2048 | 18.750 |
59+
| | | | 4 | 128 | 2048 | 19.059 |
60+
| | | | 8 | 128 | 2048 | 20.857 |
61+
| | | | 16 | 128 | 2048 | 22.670 |
62+
| | | | 32 | 128 | 2048 | 25.495 |
63+
| | | | 64 | 128 | 2048 | 34.187 |
64+
| | | | 128 | 128 | 2048 | 48.754 |
65+
| | | | 1 | 2048 | 2048 | 17.699 |
66+
| | | | 2 | 2048 | 2048 | 18.919 |
67+
| | | | 4 | 2048 | 2048 | 19.220 |
68+
| | | | 8 | 2048 | 2048 | 21.545 |
69+
| | | | 16 | 2048 | 2048 | 24.329 |
70+
| | | | 32 | 2048 | 2048 | 29.461 |
71+
| | | | 64 | 2048 | 2048 | 40.148 |
72+
| | | | 128 | 2048 | 2048 | 61.382 |
73+
| Llama 3.1 405B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 1 | 128 | 2048 | 46.601 |
74+
| | | | 2 | 128 | 2048 | 46.947 |
75+
| | | | 4 | 128 | 2048 | 48.971 |
76+
| | | | 8 | 128 | 2048 | 53.021 |
77+
| | | | 16 | 128 | 2048 | 55.836 |
78+
| | | | 32 | 128 | 2048 | 64.947 |
79+
| | | | 64 | 128 | 2048 | 81.408 |
80+
| | | | 128 | 128 | 2048 | 115.296 |
81+
| | | | 1 | 2048 | 2048 | 46.998 |
82+
| | | | 2 | 2048 | 2048 | 47.619 |
83+
| | | | 4 | 2048 | 2048 | 51.086 |
84+
| | | | 8 | 2048 | 2048 | 55.706 |
85+
| | | | 16 | 2048 | 2048 | 61.049 |
86+
| | | | 32 | 2048 | 2048 | 75.842 |
87+
| | | | 64 | 2048 | 2048 | 103.074 |
88+
| | | | 128 | 2048 | 2048 | 157.705 |
8989

9090
*TP stands for Tensor Parallelism.*
9191

@@ -487,7 +487,7 @@ To reproduce the release docker:
487487
```bash
488488
git clone https://github.com/ROCm/vllm.git
489489
cd vllm
490-
git checkout 51641aaa70d4dfb0ea1f3674b47a7d85f718847c
490+
git checkout b8498bc4a1c2aae1e25cfc780db0eadbc4716c67
491491
docker build -f Dockerfile.rocm -t <your_tag> --build-arg USE_CYTHON=1 .
492492
```
493493

@@ -504,6 +504,10 @@ Use AITER release candidate branch instead:
504504

505505
## Changelog
506506

507+
20250410_aiter:
508+
- 2-stage MoE
509+
- MLA from AITER
510+
507511
20250325_aiter:
508512
- Improved DeepSeek-V3/R1 performance
509513
- Initial Gemma-3 enablement

0 commit comments

Comments
 (0)