Skip to content

Updated README.md with April 29 results #526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 44 additions & 41 deletions docs/dev-docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,14 @@ The table below shows performance data where a local inference client is fed req

| Model | Precision | TP Size | Input | Output | Num Prompts | Max Num Seqs | Throughput (tokens/s) |
|-------|-----------|---------|-------|--------|-------------|--------------|-----------------------|
| Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 128 | 2048 | 3200 | 3200 | 16364.9 |
| | | | 128 | 4096 | 1500 | 1500 | 12171.0 |
| | | | 500 | 2000 | 2000 | 2000 | 13290.4 |
| | | | 2048 | 2048 | 1500 | 1500 | 8216.5 |
| Llama 3.1 405B (amd/Llama-3.1-405B-Instruct-FP8-KV) | FP8 | 8 | 128 | 2048 | 1500 | 1500 | 4331.6 |
| | | | 128 | 4096 | 1500 | 1500 | 3409.9 |
| | | | 500 | 2000 | 2000 | 2000 | 3184.0 |
| | | | 2048 | 2048 | 500 | 500 | 2154.3 |
| Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 128 | 2048 | 3200 | 3200 | 16896.6 |
| | | | 128 | 4096 | 1500 | 1500 | 13943.8 |
| | | | 500 | 2000 | 2000 | 2000 | 13512.8 |
| | | | 2048 | 2048 | 1500 | 1500 | 8444.5 |
| Llama 3.1 405B (amd/Llama-3.1-405B-Instruct-FP8-KV) | FP8 | 8 | 128 | 2048 | 1500 | 1500 | 4359.9 |
| | | | 128 | 4096 | 1500 | 1500 | 3430.9 |
| | | | 500 | 2000 | 2000 | 2000 | 3226.8 |
| | | | 2048 | 2048 | 500 | 500 | 2228.2 |

*TP stands for Tensor Parallelism.*

Expand All @@ -57,38 +57,38 @@ The table below shows latency measurement, which typically involves assessing th

| Model | Precision | TP Size | Batch Size | Input | Output | MI300X Latency (sec) |
|-------|-----------|----------|------------|--------|---------|-------------------|
| Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 1 | 128 | 2048 | 17.411 |
| | | | 2 | 128 | 2048 | 18.750 |
| | | | 4 | 128 | 2048 | 19.059 |
| | | | 8 | 128 | 2048 | 20.857 |
| | | | 16 | 128 | 2048 | 22.670 |
| | | | 32 | 128 | 2048 | 25.495 |
| | | | 64 | 128 | 2048 | 34.187 |
| | | | 128 | 128 | 2048 | 48.754 |
| | | | 1 | 2048 | 2048 | 17.699 |
| | | | 2 | 2048 | 2048 | 18.919 |
| | | | 4 | 2048 | 2048 | 19.220 |
| | | | 8 | 2048 | 2048 | 21.545 |
| | | | 16 | 2048 | 2048 | 24.329 |
| | | | 32 | 2048 | 2048 | 29.461 |
| | | | 64 | 2048 | 2048 | 40.148 |
| | | | 128 | 2048 | 2048 | 61.382 |
| Llama 3.1 405B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 1 | 128 | 2048 | 46.601 |
| | | | 2 | 128 | 2048 | 46.947 |
| | | | 4 | 128 | 2048 | 48.971 |
| | | | 8 | 128 | 2048 | 53.021 |
| | | | 16 | 128 | 2048 | 55.836 |
| | | | 32 | 128 | 2048 | 64.947 |
| | | | 64 | 128 | 2048 | 81.408 |
| | | | 128 | 128 | 2048 | 115.296 |
| | | | 1 | 2048 | 2048 | 46.998 |
| | | | 2 | 2048 | 2048 | 47.619 |
| | | | 4 | 2048 | 2048 | 51.086 |
| | | | 8 | 2048 | 2048 | 55.706 |
| | | | 16 | 2048 | 2048 | 61.049 |
| | | | 32 | 2048 | 2048 | 75.842 |
| | | | 64 | 2048 | 2048 | 103.074 |
| | | | 128 | 2048 | 2048 | 157.705 |
| Llama 3.1 70B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 1 | 128 | 2048 | 15.427 |
| | | | 2 | 128 | 2048 | 16.661 |
| | | | 4 | 128 | 2048 | 17.326 |
| | | | 8 | 128 | 2048 | 18.679 |
| | | | 16 | 128 | 2048 | 20.642 |
| | | | 32 | 128 | 2048 | 23.260 |
| | | | 64 | 128 | 2048 | 30.498 |
| | | | 128 | 128 | 2048 | 42.952 |
| | | | 1 | 2048 | 2048 | 15.677 |
| | | | 2 | 2048 | 2048 | 16.715 |
| | | | 4 | 2048 | 2048 | 17.684 |
| | | | 8 | 2048 | 2048 | 19.444 |
| | | | 16 | 2048 | 2048 | 22.282 |
| | | | 32 | 2048 | 2048 | 26.545 |
| | | | 64 | 2048 | 2048 | 36.651 |
| | | | 128 | 2048 | 2048 | 55.949 |
| Llama 3.1 405B (amd/Llama-3.1-70B-Instruct-FP8-KV) | FP8 | 8 | 1 | 128 | 2048 | 45.294 |
| | | | 2 | 128 | 2048 | 46.166 |
| | | | 4 | 128 | 2048 | 47.867 |
| | | | 8 | 128 | 2048 | 51.065 |
| | | | 16 | 128 | 2048 | 54.304 |
| | | | 32 | 128 | 2048 | 63.078 |
| | | | 64 | 128 | 2048 | 81.906 |
| | | | 128 | 128 | 2048 | 108.097 |
| | | | 1 | 2048 | 2048 | 46.003 |
| | | | 2 | 2048 | 2048 | 46.596 |
| | | | 4 | 2048 | 2048 | 49.273 |
| | | | 8 | 2048 | 2048 | 53.762 |
| | | | 16 | 2048 | 2048 | 59.629 |
| | | | 32 | 2048 | 2048 | 73.753 |
| | | | 64 | 2048 | 2048 | 103.530 |
| | | | 128 | 2048 | 2048 | 151.785 |

*TP stands for Tensor Parallelism.*

Expand Down Expand Up @@ -490,7 +490,7 @@ To reproduce the release docker:
```bash
git clone https://github.com/ROCm/vllm.git
cd vllm
git checkout b8498bc4a1c2aae1e25cfc780db0eadbc4716c67
git checkout c43debd43c4d8a7e4fdeff4c069c5970e5e701c0
docker build -f docker/Dockerfile.rocm -t <your_tag> --build-arg USE_CYTHON=1 .
```

Expand All @@ -507,6 +507,9 @@ Use AITER release candidate branch instead:

## Changelog

20250415_aiter:
Copy link
Collaborator

@gshtras gshtras Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we keep calling those *_aiter?
Which also reminds me, the above section about using the aiter_integration_final branch is no longer correct, the branch is long deprecated

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I was just matching the names from previous releases - please let me know if there's another name or any notes I should add.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not removed? I think something went wrong in the last commit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that section and the "_aiter", if you could advise on what notes to add to the changelog for this container that would be appreciated, otherwise I can remove that entry entirely.

- To be added
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be removed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated the note per Teresa's comment in this morning's meeting, let me know if further changes are needed - thanks!


20250410_aiter:
- 2-stage MoE
- MLA from AITER
Expand Down