Skip to content

Commit 678a35f

Browse files
windsonseasfeng33
authored andcommitted
[Docs] Replace two list with tables in intel_gaudi.md (vllm-project#20414)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
1 parent c83e031 commit 678a35f

File tree

1 file changed

+24
-19
lines changed

1 file changed

+24
-19
lines changed

docs/getting_started/installation/intel_gaudi.md

Lines changed: 24 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,12 @@ INFO 08-01 21:37:59 hpu_model_runner.py:504] Decode bucket config (min, step, ma
198198
INFO 08-01 21:37:59 hpu_model_runner.py:509] Generated 48 decode buckets: [(1, 128), (1, 256), (1, 384), (1, 512), (1, 640), (1, 768), (1, 896), (1, 1024), (1, 1152), (1, 1280), (1, 1408), (1, 1536), (1, 1664), (1, 1792), (1, 1920), (1, 2048), (2, 128), (2, 256), (2, 384), (2, 512), (2, 640), (2, 768), (2, 896), (2, 1024), (2, 1152), (2, 1280), (2, 1408), (2, 1536), (2, 1664), (2, 1792), (2, 1920), (2, 2048), (4, 128), (4, 256), (4, 384), (4, 512), (4, 640), (4, 768), (4, 896), (4, 1024), (4, 1152), (4, 1280), (4, 1408), (4, 1536), (4, 1664), (4, 1792), (4, 1920), (4, 2048)]
199199
```
200200

201-
`min` determines the lowest value of the bucket. `step` determines the interval between buckets, and `max` determines the upper bound of the bucket. Furthermore, interval between `min` and `step` has special handling -- `min` gets multiplied by consecutive powers of two, until `step` gets reached. We call this the ramp-up phase and it is used for handling lower batch sizes with minimum wastage, while allowing larger padding on larger batch sizes.
201+
| Parameter | Description |
202+
|----------------|-----------------------------------------------------------------------------|
203+
| `min` | Determines the lowest value of the bucket. |
204+
| `step` | Determines the interval between buckets. |
205+
| `max` | Determines the upper bound of the bucket. |
206+
| Ramp-up phase | A special handling phase applied between `min` and `step`:<br/>- `min` is multiplied by consecutive powers of two until `step` is reached.<br/>- Minimizes resource wastage for small batch sizes.<br/>- Allows larger padding for larger batches. |
202207

203208
Example (with ramp-up):
204209

@@ -349,28 +354,28 @@ Each described step is logged by vLLM server, as follows (negative values corres
349354

350355
- `VLLM_{phase}_{dim}_BUCKET_{param}` - collection of 12 environment variables configuring ranges of bucketing mechanism
351356

352-
* `{phase}` is either `PROMPT` or `DECODE`
357+
* `{phase}` is either `PROMPT` or `DECODE`
353358

354-
* `{dim}` is either `BS`, `SEQ` or `BLOCK`
359+
* `{dim}` is either `BS`, `SEQ` or `BLOCK`
355360

356-
* `{param}` is either `MIN`, `STEP` or `MAX`
361+
* `{param}` is either `MIN`, `STEP` or `MAX`
357362

358-
* Default values:
363+
* Default values:
359364

360-
- Prompt:
361-
- batch size min (`VLLM_PROMPT_BS_BUCKET_MIN`): `1`
362-
- batch size step (`VLLM_PROMPT_BS_BUCKET_STEP`): `min(max_num_seqs, 32)`
363-
- batch size max (`VLLM_PROMPT_BS_BUCKET_MAX`): `min(max_num_seqs, 64)`
364-
- sequence length min (`VLLM_PROMPT_SEQ_BUCKET_MIN`): `block_size`
365-
- sequence length step (`VLLM_PROMPT_SEQ_BUCKET_STEP`): `block_size`
366-
- sequence length max (`VLLM_PROMPT_SEQ_BUCKET_MAX`): `max_model_len`
367-
- Decode:
368-
- batch size min (`VLLM_DECODE_BS_BUCKET_MIN`): `1`
369-
- batch size step (`VLLM_DECODE_BS_BUCKET_STEP`): `min(max_num_seqs, 32)`
370-
- batch size max (`VLLM_DECODE_BS_BUCKET_MAX`): `max_num_seqs`
371-
- sequence length min (`VLLM_DECODE_BLOCK_BUCKET_MIN`): `block_size`
372-
- sequence length step (`VLLM_DECODE_BLOCK_BUCKET_STEP`): `block_size`
373-
- sequence length max (`VLLM_DECODE_BLOCK_BUCKET_MAX`): `max(128, (max_num_seqs*max_model_len)/block_size)`
365+
| `{phase}` | Parameter | Env Variable | Value Expression |
366+
|-----------|-----------|--------------|------------------|
367+
| Prompt | Batch size min | `VLLM_PROMPT_BS_BUCKET_MIN` | `1` |
368+
| Prompt | Batch size step | `VLLM_PROMPT_BS_BUCKET_STEP` | `min(max_num_seqs, 32)` |
369+
| Prompt | Batch size max | `VLLM_PROMPT_BS_BUCKET_MAX` | `min(max_num_seqs, 64)` |
370+
| Prompt | Sequence length min | `VLLM_PROMPT_SEQ_BUCKET_MIN` | `block_size` |
371+
| Prompt | Sequence length step | `VLLM_PROMPT_SEQ_BUCKET_STEP` | `block_size` |
372+
| Prompt | Sequence length max | `VLLM_PROMPT_SEQ_BUCKET_MAX` | `max_model_len` |
373+
| Decode | Batch size min | `VLLM_DECODE_BS_BUCKET_MIN` | `1` |
374+
| Decode | Batch size step | `VLLM_DECODE_BS_BUCKET_STEP` | `min(max_num_seqs, 32)` |
375+
| Decode | Batch size max | `VLLM_DECODE_BS_BUCKET_MAX` | `max_num_seqs` |
376+
| Decode | Sequence length min | `VLLM_DECODE_BLOCK_BUCKET_MIN` | `block_size` |
377+
| Decode | Sequence length step | `VLLM_DECODE_BLOCK_BUCKET_STEP` | `block_size` |
378+
| Decode | Sequence length max | `VLLM_DECODE_BLOCK_BUCKET_MAX` | `max(128, (max_num_seqs*max_model_len)/block_size)` |
374379

375380
Additionally, there are HPU PyTorch Bridge environment variables impacting vLLM execution:
376381

0 commit comments

Comments
 (0)