You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/offline_inference/profiling_tpu/README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ This script is used to profile the TPU performance of vLLM for specific prefill
4
4
5
5
Note: an actual running server is a mix of both prefill of many shapes and decode of many shapes.
6
6
7
-
We assume you are on a TPU already (this was tested on TPU v6e) and have installed vLLM according to the [installation guide](https://docs.vllm.ai/en/latest/getting_started/installation/ai_accelerator/index.html).
7
+
We assume you are on a TPU already (this was tested on TPU v6e) and have installed vLLM according to the [Google TPU installation guide](https://docs.vllm.ai/en/latest/getting_started/installation/google_tpu.html).
8
8
9
9
> In all examples below, we run several warmups before (so `--enforce-eager` is okay)
0 commit comments