Skip to content

Commit c48c6c4

Browse files
authored
Add a doc on how to update PyTorch version (#19705)
1 parent aed8468 commit c48c6c4

File tree

1 file changed

+134
-0
lines changed

1 file changed

+134
-0
lines changed

docs/ci/update_pytorch_version.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
title: Update PyTorch version on vLLM OSS CI/CD
3+
---
4+
5+
vLLM's current policy is to always use the latest PyTorch stable
6+
release in CI/CD. It is standard practice to submit a PR to update the
7+
PyTorch version as early as possible when a new [PyTorch stable
8+
release](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-cadence) becomes available.
9+
This process is non-trivial due to the gap between PyTorch
10+
releases. Using [#16859](https://github.com/vllm-project/vllm/pull/16859) as
11+
an example, this document outlines common steps to achieve this update along with
12+
a list of potential issues and how to address them.
13+
14+
## Test PyTorch release candidates (RCs)
15+
16+
Updating PyTorch in vLLM after the official release is not
17+
ideal because any issues discovered at that point can only be resolved
18+
by waiting for the next release or by implementing hacky workarounds in vLLM.
19+
The better solution is to test vLLM with PyTorch release candidates (RC) to ensure
20+
compatibility before each release.
21+
22+
PyTorch release candidates can be downloaded from PyTorch test index at https://download.pytorch.org/whl/test.
23+
For example, torch2.7.0+cu12.8 RC can be installed using the following command:
24+
25+
```
26+
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
27+
```
28+
29+
When the final RC is ready for testing, it will be announced to the community
30+
on the [PyTorch dev-discuss forum](https://dev-discuss.pytorch.org/c/release-announcements).
31+
After this announcement, we can begin testing vLLM integration by drafting a pull request
32+
following this 3-step process:
33+
34+
1. Update requirements files in https://github.com/vllm-project/vllm/tree/main/requirements
35+
to point to the new releases for torch, torchvision, and torchaudio.
36+
2. Use `--extra-index-url https://download.pytorch.org/whl/test/<PLATFORM>` to
37+
get the final release candidates' wheels. Some common platforms are `cpu`, `cu128`,
38+
and `rocm6.2.4`.
39+
3. As vLLM uses uv, make sure that `unsafe-best-match` strategy is set either
40+
via `UV_INDEX_STRATEGY` env variable or via `--index-strategy unsafe-best-match`.
41+
42+
If failures are found in the pull request, raise them as issues on vLLM and
43+
cc the PyTorch release team to initiate discussion on how to address them.
44+
45+
## Update CUDA version
46+
47+
The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example,
48+
torch2.7.0+cu12.6) is uploaded to PyPI. However, vLLM may require a different CUDA version,
49+
such as 12.8 for Blackwell support.
50+
This complicates the process as we cannot use the out-of-the-box
51+
`pip install torch torchvision torchaudio` command. The solution is to use
52+
`--extra-index-url` in vLLM's Dockerfiles.
53+
54+
1. Use `--extra-index-url https://download.pytorch.org/whl/cu128` to install torch+cu128.
55+
2. Other important indexes at the moment include:
56+
1. CPU ‒ https://download.pytorch.org/whl/cpu
57+
2. ROCm ‒ https://download.pytorch.org/whl/rocm6.2.4 and https://download.pytorch.org/whl/rocm6.3
58+
3. XPU ‒ https://download.pytorch.org/whl/xpu
59+
3. Update .buildkite/release-pipeline.yaml and .buildkite/scripts/upload-wheels.sh to
60+
match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested
61+
on CI.
62+
63+
## Address long vLLM build time
64+
65+
When building vLLM with a new PyTorch/CUDA version, no cache will exist
66+
in the vLLM sccache S3 bucket, causing the build job on CI to potentially take more than 5 hours
67+
and timeout. Additionally, since vLLM's fastcheck pipeline runs in read-only mode,
68+
it doesn't populate the cache, so re-running it to warm up the cache
69+
is ineffective.
70+
71+
While ongoing efforts like [#17419](https://github.com/vllm-project/vllm/issues/17419)
72+
address the long build time at its source, the current workaround is to set VLLM_CI_BRANCH
73+
to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`)
74+
when manually triggering a build on Buildkite. This branch accomplishes two things:
75+
76+
1. Increase the timeout limit to 10 hours so that the build doesn't timeout.
77+
2. Allow the compiled artifacts to be written to the vLLM sccache S3 bucket
78+
to warm it up so that future builds are faster.
79+
80+
<p align="center" width="100%">
81+
<img width="60%" src="https://github.com/user-attachments/assets/a8ff0fcd-76e0-4e91-b72f-014e3fdb6b94">
82+
</p>
83+
84+
## Update dependencies
85+
86+
Several vLLM dependencies, such as FlashInfer, also depend on PyTorch and need
87+
to be updated accordingly. Rather than waiting for all of them to publish new
88+
releases (which would take too much time), they can be built from
89+
source to unblock the update process.
90+
91+
### FlashInfer
92+
Here is how to build and install it from source with torch2.7.0+cu128 in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
93+
94+
```
95+
export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0 10.0+PTX'
96+
export FLASHINFER_ENABLE_SM90=1
97+
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6.post1"
98+
```
99+
100+
One caveat is that building FlashInfer from source adds approximately 30
101+
minutes to the vLLM build time. Therefore, it's preferable to cache the wheel in a
102+
public location for immediate installation, such as https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl. For future releases, contact the PyTorch release
103+
team if you want to get the package published there.
104+
105+
### xFormers
106+
Similar to FlashInfer, here is how to build and install xFormers from source:
107+
108+
```
109+
export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0+PTX'
110+
MAX_JOBS=16 uv pip install --system --no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30"
111+
```
112+
113+
### Mamba
114+
115+
```
116+
uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4"
117+
```
118+
119+
### causal-conv1d
120+
121+
```
122+
uv pip install 'git+https://github.com/Dao-AILab/causal-conv1d@v1.5.0.post8'
123+
```
124+
125+
## Update all the different vLLM platforms
126+
127+
Rather than attempting to update all vLLM platforms in a single pull request, it's more manageable
128+
to handle some platforms separately. The separation of requirements and Dockerfiles
129+
for different platforms in vLLM CI/CD allows us to selectively choose
130+
which platforms to update. For instance, updating XPU requires the corresponding
131+
release from https://github.com/intel/intel-extension-for-pytorch by Intel.
132+
While https://github.com/vllm-project/vllm/pull/16859 updated vLLM to PyTorch
133+
2.7.0 on CPU, CUDA, and ROCm, https://github.com/vllm-project/vllm/pull/17444
134+
completed the update for XPU.

0 commit comments

Comments
 (0)