Skip to content

Commit 1d46569

Browse files
committed
--amend
Signed-off-by: Chenyaaang <chenyangli@google.com>
1 parent 7e60550 commit 1d46569

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

benchmarks/auto_tune/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,15 @@ You must set the following variables at the top of the script before execution.
3232

3333
| Variable | Description | Example Value |
3434
| --- | --- | --- |
35-
| `BASE` | **Required.** The absolute path to your vLLM repository directory. | `"$HOME"` |
35+
| `BASE` | **Required.** The absolute path to the parent directory of your vLLM repository directory. | `"$HOME"` |
3636
| `MODEL` | **Required.** The Hugging Face model identifier to be served by vllm. | `"meta-llama/Llama-3.1-8B-Instruct"` |
3737
| `SYSTEM`| **Required.** The hardware you are running on. Choices: `TPU` or `GPU`. (For other systems, it might not support saving profiles) | `"TPU"` |
3838
| `TP` | **Required.** The tensor-parallelism size. | `1` |
3939
| `DOWNLOAD_DIR` | **Required.** Directory to download and load model weights from. | `""` (default download path) |
4040
| `INPUT_LEN` | **Required.** Request input length. | `4000` |
4141
| `OUTPUT_LEN` | **Required.** Request output length. | `16` |
4242
| `MIN_CACHE_HIT_PCT` | Prefix cache hit rate in percentage (0-100). Set to `0` to disable. | `60` |
43-
| `MAX_LATENCY_ALLOWED_MS` | The maximum allowed P99 end-to-end latency in milliseconds. Set to a very large number (e.g., `1000000000`) to effectively ignore the latency constraint. | `500` |
43+
| `MAX_LATENCY_ALLOWED_MS` | The maximum allowed P99 end-to-end latency in milliseconds. Set to a very large number (e.g., `100000000000`) to effectively ignore the latency constraint. | `500` |
4444
| `NUM_SEQS_LIST` | A space-separated string of `max-num-seqs` values to test. | `"128 256"` |
4545
| `NUM_BATCHED_TOKENS_LIST` | A space-separated string of `max-num-batched-tokens` values to test. | `"1024 2048 4096"` |
4646

@@ -54,7 +54,7 @@ You must set the following variables at the top of the script before execution.
5454
cd <FOLDER_OF_THIS_SCRIPT>
5555
bash auto_tune.sh
5656
```
57-
Please note that the `bash auto_tune.sh` command cannot contain full or paritial path with keyword `vllm`, otherwise `pkill -f vllm` command will also kill this script itself.
57+
Please note that the `bash auto_tune.sh` command cannot contain full or partial path with keyword `vllm`, otherwise `pkill -f vllm` command will also kill this script itself.
5858

5959

6060
## Example Use Cases
@@ -68,7 +68,7 @@ Here are a few examples of how to configure the script for different goals:
6868
INPUT_LEN=1800
6969
OUTPUT_LEN=20
7070
MIN_CACHE_HIT_PCT=0
71-
MAX_LATENCY_ALLOWED_MS=1000000000 # A very large number
71+
MAX_LATENCY_ALLOWED_MS=100000000000 # A very large number
7272
```
7373

7474
#### 2. Maximize Throughput with a Latency Requirement
@@ -108,7 +108,7 @@ After the script finishes, you will find the results in a new, timestamped direc
108108
...
109109
best_max_num_seqs: 256, best_num_batched_tokens: 2048, best_throughput: 12.5, profile saved in: /home/user/vllm/auto-benchmark/2024_08_01_10_30/profile
110110
```
111-
If it cannot find the best parameters, the final row will be `best_max_num_seqs: 0, best_num_batched_tokens: 0, best_throughput: 0`, it can due to either server didn't start properly, or the latency requirement too strict.
111+
If it cannot find the best parameters, the final row will be `best_max_num_seqs: 0, best_num_batched_tokens: 0, best_throughput: 0`. This can be due to either the server not starting properly, or the latency requirement being too strict.
112112

113113
- **Profiler Trace**: A directory named `profile` is created inside the log directory. It contains the profiler trace file (e.g., `.xplane.pb` for TPU or a `.json` trace for GPU) from the single best-performing run.
114114

@@ -128,4 +128,4 @@ The script follows a systematic process to find the optimal parameters:
128128

129129
4. **Track Best Result**: Throughout the process, the script tracks the parameter combination that has yielded the highest valid throughput so far.
130130

131-
5. **Profile Collection**: For the best-performing run, the script saves the vLLM profiler output, which can be used for deep-dive performance analysis with tools like TensorBoard.
131+
5. **Profile Collection**: For the best-performing run, the script saves the vLLM profiler output, which can be used for deep-dive performance analysis with tools like TensorBoard.

0 commit comments

Comments
 (0)