Skip to content

[WIP][RC] Update PyTorch to 2.8.0 #20358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ repos:
rev: 0.6.17
hooks:
- id: pip-compile
args: [requirements/test.in, -o, requirements/test.txt, --index-strategy, unsafe-best-match, --torch-backend, cu128]
args: [requirements/test.in, -o, requirements/test.txt, --index-strategy, unsafe-best-match, --extra-index-url, https://download.pytorch.org/whl/test/cu128]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There seems to be a redundancy in how the extra index URL for PyTorch is specified.

The --extra-index-url is provided here in the pip-compile arguments, but it has also been added to requirements/test.in.

The comment in the generated requirements/test.txt file indicates that the intended command for regeneration is uv pip compile requirements/test.in ... without the --extra-index-url argument, implying it should be read from the .in file.

To maintain a single source of truth and avoid confusion, it would be best to remove the --extra-index-url argument from this hook and rely on the definition in requirements/test.in. This would make the pre-commit hook's behavior consistent with the documented generation command.

      args: [requirements/test.in, -o, requirements/test.txt, --index-strategy, unsafe-best-match]

files: ^requirements/test\.(in|txt)$
- repo: local
hooks:
Expand Down
4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ set(HIP_SUPPORTED_ARCHS "gfx906;gfx908;gfx90a;gfx942;gfx950;gfx1030;gfx1100;gfx1
# requirements.txt files and should be kept consistent. The ROCm torch
# versions are derived from docker/Dockerfile.rocm
#
set(TORCH_SUPPORTED_VERSION_CUDA "2.7.0")
set(TORCH_SUPPORTED_VERSION_ROCM "2.7.0")
set(TORCH_SUPPORTED_VERSION_CUDA "2.8.0")
set(TORCH_SUPPORTED_VERSION_ROCM "2.8.0")

#
# Try to find python package with an executable that exactly matches
Expand Down
16 changes: 13 additions & 3 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ ARG UV_INDEX_URL=${PIP_INDEX_URL}
ARG UV_EXTRA_INDEX_URL=${PIP_EXTRA_INDEX_URL}

# PyTorch provides its own indexes for standard and nightly builds
ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl
ARG PYTORCH_CUDA_INDEX_BASE_URL=https://download.pytorch.org/whl/test
ARG PYTORCH_CUDA_NIGHTLY_INDEX_BASE_URL=https://download.pytorch.org/whl/nightly

# PIP supports multiple authentication schemes, including keyring
Expand Down Expand Up @@ -363,6 +363,16 @@ RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist
uv pip install --system dist/*.whl --verbose \
--extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.')

# TODO (huydhn): Remove this once xformers is released for 2.8.0
# https://pytorch.s3.us-east-1.amazonaws.com/whl/test/cu128/xformers/xformers-0.0.30%2B4cf69f09.d20250708-cp312-cp312-linux_x86_64.whl
RUN --mount=type=cache,target=/root/.cache/uv bash - <<'BASH'
. /etc/environment
export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0 12.0'
uv pip install --system --no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30"
# DEBUG
python3 -m xformers.info
BASH

# If we need to build FlashInfer wheel before its release:
# $ # Note we remove 7.0 from the arch list compared to the list below, since FlashInfer only supports sm75+
# $ export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0a 10.0a 12.0'
Expand All @@ -376,8 +386,8 @@ RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist
# $ # upload the wheel to a public location, e.g. https://wheels.vllm.ai/flashinfer/v0.2.6.post1/flashinfer_python-0.2.6.post1-cp39-abi3-linux_x86_64.whl

# Allow specifying a version, Git revision or local .whl file
ARG FLASHINFER_CUDA128_INDEX_URL="https://download.pytorch.org/whl/cu128/flashinfer"
ARG FLASHINFER_CUDA128_WHEEL="flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl"
ARG FLASHINFER_CUDA128_INDEX_URL="https://download.pytorch.org/whl/test/cu128/flashinfer"
ARG FLASHINFER_CUDA128_WHEEL="flashinfer_python-0.2.6.post1%2Bcu128torch2.8-cp39-abi3-linux_x86_64.whl"
ARG FLASHINFER_GIT_REPO="https://github.com/flashinfer-ai/flashinfer.git"
ARG FLASHINFER_GIT_REF="v0.2.8rc1"
# Flag to control whether to use pre-built FlashInfer wheels (set to false to force build from source)
Expand Down
11 changes: 11 additions & 0 deletions docs/contributing/ci/update_pytorch_version.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,17 @@ to point to the new releases for `torch`, `torchvision`, and `torchaudio`.
If failures are found in the pull request, raise them as issues on vLLM and
cc the PyTorch release team to initiate discussion on how to address them.

### Update some tests to use PyTorch RC

#### Python-only installation test

Update tests/standalone_tests/python_only_compile.sh to

```
VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 VLLM_USE_PRECOMPILED=1 pip3 install -vvv -e . \
--extra-index-url https://download.pytorch.org/whl/test/cu128
```

## Update CUDA version

The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example,
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ requires = [
"packaging>=24.2",
"setuptools>=77.0.3,<80.0.0",
"setuptools-scm>=8.0",
"torch == 2.7.0",
"torch == 2.8.0",
"wheel",
"jinja2",
]
Expand Down
3 changes: 2 additions & 1 deletion requirements/build.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ ninja
packaging>=24.2
setuptools>=77.0.3,<80.0.0
setuptools-scm>=8
torch==2.7.0
torch==2.8.0
wheel
jinja2>=3.1.6
regex
build
10 changes: 5 additions & 5 deletions requirements/cpu.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,18 @@ numba == 0.61.2; python_version > '3.9'
# Dependencies for CPUs
packaging>=24.2
setuptools>=77.0.3,<80.0.0
--extra-index-url https://download.pytorch.org/whl/cpu
--extra-index-url https://download.pytorch.org/whl/test/cpu
torch==2.6.0+cpu; platform_machine == "x86_64" # torch>2.6.0+cpu has performance regression on x86 platform, see https://github.com/pytorch/pytorch/pull/151218
torch==2.7.0; platform_system == "Darwin"
torch==2.7.0; platform_machine == "ppc64le" or platform_machine == "aarch64"
torch==2.8.0; platform_system == "Darwin"
torch==2.8.0; platform_machine == "ppc64le" or platform_machine == "aarch64"

# required for the image processor of minicpm-o-2_6, this must be updated alongside torch
torchaudio; platform_machine != "ppc64le" and platform_machine != "s390x"
torchaudio==2.7.0; platform_machine == "ppc64le"
torchaudio==2.8.0; platform_machine == "ppc64le"

# required for the image processor of phi3v, this must be updated alongside torch
torchvision; platform_machine != "ppc64le" and platform_machine != "s390x"
torchvision==0.22.0; platform_machine == "ppc64le"
torchvision==0.23.0; platform_machine == "ppc64le"
datasets # for benchmark scripts

# Intel Extension for PyTorch, only for x86_64 CPUs
Expand Down
9 changes: 5 additions & 4 deletions requirements/cuda.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@ numba == 0.61.2; python_version > '3.9'

# Dependencies for NVIDIA GPUs
ray[cgraph]>=2.43.0, !=2.44.* # Ray Compiled Graph, required for pipeline parallelism in V1.
torch==2.7.0
torchaudio==2.7.0
torch==2.8.0
torchaudio==2.8.0
# These must be updated alongside torch
torchvision==0.22.0 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version
torchvision==0.23.0 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version
# TODO (huydhn): Re-enable this once xformers is released for 2.8.0
# https://github.com/facebookresearch/xformers/releases/tag/v0.0.30
xformers==0.0.30; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch >= 2.7
# git+https://github.com/facebookresearch/xformers@v0.0.30; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch >= 2.7
8 changes: 4 additions & 4 deletions requirements/rocm-build.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Common dependencies
-r common.txt

--extra-index-url https://download.pytorch.org/whl/rocm6.2.4
torch==2.7.0
torchvision==0.22.0
torchaudio==2.7.0
--extra-index-url https://download.pytorch.org/whl/test/rocm6.3
torch==2.8.0
torchvision==0.23.0
torchaudio==2.8.0

triton==3.2
cmake>=3.26.1,<4
Expand Down
7 changes: 4 additions & 3 deletions requirements/test.in
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,10 @@ sentence-transformers # required for embedding tests
soundfile # required for audio tests
jiwer # required for audio tests
timm # required for internvl test
torch==2.7.0
torchaudio==2.7.0
torchvision==0.22.0
--extra-index-url https://download.pytorch.org/whl/test/cu128
torch==2.8.0
torchaudio==2.8.0
torchvision==0.23.0
transformers_stream_generator # required for qwen-vl test
mamba_ssm # required for plamo2 test
matplotlib # required for qwen-vl test
Expand Down
38 changes: 19 additions & 19 deletions requirements/test.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# This file was autogenerated by uv via the following command:
# uv pip compile requirements/test.in -o requirements/test.txt --index-strategy unsafe-best-match --torch-backend cu128
# uv pip compile requirements/test.in -o requirements/test.txt --index-strategy unsafe-best-match
absl-py==2.1.0
# via rouge-score
accelerate==1.0.1
Expand Down Expand Up @@ -377,42 +377,42 @@ numpy==1.26.4
# transformers
# tritonclient
# vocos
nvidia-cublas-cu12==12.8.3.14
nvidia-cublas-cu12==12.8.4.1
# via
# nvidia-cudnn-cu12
# nvidia-cusolver-cu12
# torch
nvidia-cuda-cupti-cu12==12.8.57
nvidia-cuda-cupti-cu12==12.8.90
# via torch
nvidia-cuda-nvrtc-cu12==12.8.61
nvidia-cuda-nvrtc-cu12==12.8.93
# via torch
nvidia-cuda-runtime-cu12==12.8.57
nvidia-cuda-runtime-cu12==12.8.90
# via torch
nvidia-cudnn-cu12==9.7.1.26
nvidia-cudnn-cu12==9.10.2.21
# via torch
nvidia-cufft-cu12==11.3.3.41
nvidia-cufft-cu12==11.3.3.83
# via torch
nvidia-cufile-cu12==1.13.0.11
nvidia-cufile-cu12==1.13.1.3
# via torch
nvidia-curand-cu12==10.3.9.55
nvidia-curand-cu12==10.3.9.90
# via torch
nvidia-cusolver-cu12==11.7.2.55
nvidia-cusolver-cu12==11.7.3.90
# via torch
nvidia-cusparse-cu12==12.5.7.53
nvidia-cusparse-cu12==12.5.8.93
# via
# nvidia-cusolver-cu12
# torch
nvidia-cusparselt-cu12==0.6.3
nvidia-cusparselt-cu12==0.7.1
# via torch
nvidia-nccl-cu12==2.26.2
nvidia-nccl-cu12==2.27.3
# via torch
nvidia-nvjitlink-cu12==12.8.61
nvidia-nvjitlink-cu12==12.8.93
# via
# nvidia-cufft-cu12
# nvidia-cusolver-cu12
# nvidia-cusparse-cu12
# torch
nvidia-nvtx-cu12==12.8.55
nvidia-nvtx-cu12==12.8.90
# via torch
opencensus==0.11.4
# via ray
Expand Down Expand Up @@ -762,7 +762,7 @@ tomli==2.2.1
# via schemathesis
tomli-w==1.2.0
# via schemathesis
torch==2.7.0+cu128
torch==2.8.0+cu128
# via
# -r requirements/test.in
# accelerate
Expand All @@ -781,12 +781,12 @@ torch==2.7.0+cu128
# torchvision
# vector-quantize-pytorch
# vocos
torchaudio==2.7.0+cu128
torchaudio==2.8.0+cu128
# via
# -r requirements/test.in
# encodec
# vocos
torchvision==0.22.0+cu128
torchvision==0.23.0+cu128
# via
# -r requirements/test.in
# timm
Expand Down Expand Up @@ -816,7 +816,7 @@ transformers==4.53.2
# transformers-stream-generator
transformers-stream-generator==0.0.5
# via -r requirements/test.in
triton==3.3.0
triton==3.4.0
# via torch
tritonclient==2.51.0
# via
Expand Down
2 changes: 1 addition & 1 deletion tests/distributed/test_sequence_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ def _compare_sp(
# TODO support other models
# [LANGUAGE GENERATION]
"meta-llama/Llama-3.2-1B-Instruct",
"RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8"
"RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8",
]


Expand Down
4 changes: 3 additions & 1 deletion tests/lora/test_chatglm3_tp.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,9 @@ def test_chatglm3_lora_tp4_fully_sharded_loras(chatglm3_lora_files):
tensor_parallel_size=4,
trust_remote_code=True,
fully_sharded_loras=True,
enable_chunked_prefill=True)
enable_chunked_prefill=True,
gpu_memory_utilization=0.85,
cpu_offload_gb=10)
Copy link
Collaborator

@jeejeelee jeejeelee Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Why do we need to set cpu_offload_gb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting OOM when running this test. It's unclear if it's related to 2.8 upgrade

output1 = do_sample(llm, chatglm3_lora_files, lora_id=1)
for i in range(len(EXPECTED_LORA_OUTPUT)):
assert output1[i] == EXPECTED_LORA_OUTPUT[i]
Expand Down
4 changes: 3 additions & 1 deletion tests/standalone_tests/python_only_compile.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ apt autoremove -y

echo 'import os; os.system("touch /tmp/changed.file")' >> vllm/__init__.py

VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 VLLM_USE_PRECOMPILED=1 pip3 install -vvv -e .
# TESTING, TO BE REMOVED
VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 VLLM_USE_PRECOMPILED=1 pip3 install -vvv -e . \
--extra-index-url https://download.pytorch.org/whl/test/cu128

# Run the script
python3 -c 'import vllm'
Expand Down