|
| 1 | +--- |
| 2 | +title: Update PyTorch version on vLLM OSS CI/CD |
| 3 | +--- |
| 4 | + |
| 5 | +vLLM's current policy is to always use the latest PyTorch stable |
| 6 | +release in CI/CD. It is standard practice to submit a PR to update the |
| 7 | +PyTorch version as early as possible when a new [PyTorch stable |
| 8 | +release](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-cadence) becomes available. |
| 9 | +This process is non-trivial due to the gap between PyTorch |
| 10 | +releases. Using [#16859](https://github.com/vllm-project/vllm/pull/16859) as |
| 11 | +an example, this document outlines common steps to achieve this update along with |
| 12 | +a list of potential issues and how to address them. |
| 13 | + |
| 14 | +## Test PyTorch release candidates (RCs) |
| 15 | + |
| 16 | +Updating PyTorch in vLLM after the official release is not |
| 17 | +ideal because any issues discovered at that point can only be resolved |
| 18 | +by waiting for the next release or by implementing hacky workarounds in vLLM. |
| 19 | +The better solution is to test vLLM with PyTorch release candidates (RC) to ensure |
| 20 | +compatibility before each release. |
| 21 | + |
| 22 | +PyTorch release candidates can be downloaded from PyTorch test index at https://download.pytorch.org/whl/test. |
| 23 | +For example, torch2.7.0+cu12.8 RC can be installed using the following command: |
| 24 | + |
| 25 | +``` |
| 26 | +uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128 |
| 27 | +``` |
| 28 | + |
| 29 | +When the final RC is ready for testing, it will be announced to the community |
| 30 | +on the [PyTorch dev-discuss forum](https://dev-discuss.pytorch.org/c/release-announcements). |
| 31 | +After this announcement, we can begin testing vLLM integration by drafting a pull request |
| 32 | +following this 3-step process: |
| 33 | + |
| 34 | +1. Update requirements files in https://github.com/vllm-project/vllm/tree/main/requirements |
| 35 | +to point to the new releases for torch, torchvision, and torchaudio. |
| 36 | +2. Use `--extra-index-url https://download.pytorch.org/whl/test/<PLATFORM>` to |
| 37 | +get the final release candidates' wheels. Some common platforms are `cpu`, `cu128`, |
| 38 | +and `rocm6.2.4`. |
| 39 | +3. As vLLM uses uv, make sure that `unsafe-best-match` strategy is set either |
| 40 | +via `UV_INDEX_STRATEGY` env variable or via `--index-strategy unsafe-best-match`. |
| 41 | + |
| 42 | +If failures are found in the pull request, raise them as issues on vLLM and |
| 43 | +cc the PyTorch release team to initiate discussion on how to address them. |
| 44 | + |
| 45 | +## Update CUDA version |
| 46 | + |
| 47 | +The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example, |
| 48 | +torch2.7.0+cu12.6) is uploaded to PyPI. However, vLLM may require a different CUDA version, |
| 49 | +such as 12.8 for Blackwell support. |
| 50 | +This complicates the process as we cannot use the out-of-the-box |
| 51 | +`pip install torch torchvision torchaudio` command. The solution is to use |
| 52 | +`--extra-index-url` in vLLM's Dockerfiles. |
| 53 | + |
| 54 | +1. Use `--extra-index-url https://download.pytorch.org/whl/cu128` to install torch+cu128. |
| 55 | +2. Other important indexes at the moment include: |
| 56 | + 1. CPU ‒ https://download.pytorch.org/whl/cpu |
| 57 | + 2. ROCm ‒ https://download.pytorch.org/whl/rocm6.2.4 and https://download.pytorch.org/whl/rocm6.3 |
| 58 | + 3. XPU ‒ https://download.pytorch.org/whl/xpu |
| 59 | +3. Update .buildkite/release-pipeline.yaml and .buildkite/scripts/upload-wheels.sh to |
| 60 | +match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested |
| 61 | +on CI. |
| 62 | + |
| 63 | +## Address long vLLM build time |
| 64 | + |
| 65 | +When building vLLM with a new PyTorch/CUDA version, no cache will exist |
| 66 | +in the vLLM sccache S3 bucket, causing the build job on CI to potentially take more than 5 hours |
| 67 | +and timeout. Additionally, since vLLM's fastcheck pipeline runs in read-only mode, |
| 68 | +it doesn't populate the cache, so re-running it to warm up the cache |
| 69 | +is ineffective. |
| 70 | + |
| 71 | +While ongoing efforts like [#17419](https://github.com/vllm-project/vllm/issues/17419) |
| 72 | +address the long build time at its source, the current workaround is to set VLLM_CI_BRANCH |
| 73 | +to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`) |
| 74 | +when manually triggering a build on Buildkite. This branch accomplishes two things: |
| 75 | + |
| 76 | +1. Increase the timeout limit to 10 hours so that the build doesn't timeout. |
| 77 | +2. Allow the compiled artifacts to be written to the vLLM sccache S3 bucket |
| 78 | +to warm it up so that future builds are faster. |
| 79 | + |
| 80 | +<p align="center" width="100%"> |
| 81 | + <img width="60%" src="https://github.com/user-attachments/assets/a8ff0fcd-76e0-4e91-b72f-014e3fdb6b94"> |
| 82 | +</p> |
| 83 | + |
| 84 | +## Update dependencies |
| 85 | + |
| 86 | +Several vLLM dependencies, such as FlashInfer, also depend on PyTorch and need |
| 87 | +to be updated accordingly. Rather than waiting for all of them to publish new |
| 88 | +releases (which would take too much time), they can be built from |
| 89 | +source to unblock the update process. |
| 90 | + |
| 91 | +### FlashInfer |
| 92 | +Here is how to build and install it from source with torch2.7.0+cu128 in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271): |
| 93 | + |
| 94 | +``` |
| 95 | +export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0 10.0+PTX' |
| 96 | +export FLASHINFER_ENABLE_SM90=1 |
| 97 | +uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6.post1" |
| 98 | +``` |
| 99 | + |
| 100 | +One caveat is that building FlashInfer from source adds approximately 30 |
| 101 | +minutes to the vLLM build time. Therefore, it's preferable to cache the wheel in a |
| 102 | +public location for immediate installation, such as https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl. For future releases, contact the PyTorch release |
| 103 | +team if you want to get the package published there. |
| 104 | + |
| 105 | +### xFormers |
| 106 | +Similar to FlashInfer, here is how to build and install xFormers from source: |
| 107 | + |
| 108 | +``` |
| 109 | +export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0+PTX' |
| 110 | +MAX_JOBS=16 uv pip install --system --no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30" |
| 111 | +``` |
| 112 | + |
| 113 | +### Mamba |
| 114 | + |
| 115 | +``` |
| 116 | +uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4" |
| 117 | +``` |
| 118 | + |
| 119 | +### causal-conv1d |
| 120 | + |
| 121 | +``` |
| 122 | +uv pip install 'git+https://github.com/Dao-AILab/causal-conv1d@v1.5.0.post8' |
| 123 | +``` |
| 124 | + |
| 125 | +## Update all the different vLLM platforms |
| 126 | + |
| 127 | +Rather than attempting to update all vLLM platforms in a single pull request, it's more manageable |
| 128 | +to handle some platforms separately. The separation of requirements and Dockerfiles |
| 129 | +for different platforms in vLLM CI/CD allows us to selectively choose |
| 130 | +which platforms to update. For instance, updating XPU requires the corresponding |
| 131 | +release from https://github.com/intel/intel-extension-for-pytorch by Intel. |
| 132 | +While https://github.com/vllm-project/vllm/pull/16859 updated vLLM to PyTorch |
| 133 | +2.7.0 on CPU, CUDA, and ROCm, https://github.com/vllm-project/vllm/pull/17444 |
| 134 | +completed the update for XPU. |
0 commit comments