You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
3. Since vLLM uses `uv`, ensure the following index strategy is applied:
42
+
43
+
- Via environment variable:
44
+
45
+
```bash
46
+
export UV_INDEX_STRATEGY=unsafe-best-match
47
+
```
48
+
49
+
- Or via CLI flag:
50
+
51
+
```bash
52
+
--index-strategy unsafe-best-match
53
+
```
38
54
39
55
If failures are found in the pull request, raise them as issues on vLLM and
40
56
cc the PyTorch release team to initiate discussion on how to address them.
41
57
42
58
## Update CUDA version
43
59
44
60
The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example,
45
-
torch2.7.0+cu12.6) is uploaded to PyPI. However, vLLM may require a different CUDA version,
61
+
`torch2.7.0+cu12.6`) is uploaded to PyPI. However, vLLM may require a different CUDA version,
46
62
such as 12.8 for Blackwell support.
47
63
This complicates the process as we cannot use the out-of-the-box
48
64
`pip install torch torchvision torchaudio` command. The solution is to use
49
65
`--extra-index-url`in vLLM's Dockerfiles.
50
66
51
-
1. Use `--extra-index-url https://download.pytorch.org/whl/cu128` to install torch+cu128.
52
-
2. Other important indexes at the moment include:
53
-
1. CPU ‒ https://download.pytorch.org/whl/cpu
54
-
2. ROCm ‒ https://download.pytorch.org/whl/rocm6.2.4 and https://download.pytorch.org/whl/rocm6.3
55
-
3. XPU ‒ https://download.pytorch.org/whl/xpu
56
-
3. Update .buildkite/release-pipeline.yaml and .buildkite/scripts/upload-wheels.sh to
57
-
match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested
58
-
on CI.
67
+
- Important indexes at the moment include:
68
+
69
+
| Platform | `--extra-index-url` |
70
+
|----------|-----------------|
71
+
| CUDA 12.8| [https://download.pytorch.org/whl/cu128](https://download.pytorch.org/whl/cu128)|
72
+
| CPU | [https://download.pytorch.org/whl/cpu](https://download.pytorch.org/whl/cpu)|
- Update the below files to match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested on CI.
78
+
- `.buildkite/release-pipeline.yaml`
79
+
- `.buildkite/scripts/upload-wheels.sh`
59
80
60
81
## Address long vLLM build time
61
82
@@ -66,7 +87,7 @@ it doesn't populate the cache, so re-running it to warm up the cache
66
87
is ineffective.
67
88
68
89
While ongoing efforts like [#17419](gh-issue:17419)
69
-
address the long build time at its source, the current workaround is to set VLLM_CI_BRANCH
90
+
address the long build time at its source, the current workaround is to set `VLLM_CI_BRANCH`
70
91
to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`)
71
92
when manually triggering a build on Buildkite. This branch accomplishes two things:
72
93
@@ -86,31 +107,34 @@ releases (which would take too much time), they can be built from
86
107
source to unblock the update process.
87
108
88
109
### FlashInfer
89
-
Here is how to build and install it from source with torch2.7.0+cu128 in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
110
+
Here is how to build and install it from source with `torch2.7.0+cu128`in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
One caveat is that building FlashInfer from source adds approximately 30
98
120
minutes to the vLLM build time. Therefore, it's preferable to cache the wheel in a
99
-
public location for immediate installation, such as https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl. For future releases, contact the PyTorch release
121
+
public location for immediate installation, such as [this FlashInfer wheel link](https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl). For future releases, contact the PyTorch release
100
122
team if you want to get the package published there.
101
123
102
124
### xFormers
103
125
Similar to FlashInfer, here is how to build and install xFormers from source:
0 commit comments