Skip to content

feat(vllm-tensorizer): Upgrade vLLM version and Resolve Related Build Compatibility Issues #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jun 18, 2025
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
56eed9d
ci(vllm-tensorizer): Update vLLM source commit in build pipeline
JustinPerlman Jun 6, 2025
1b8b7bb
build(vllm-tensorizer): Update `torch-extras` base image
JustinPerlman Jun 6, 2025
face617
chore: Add .idea/ to .gitignore
JustinPerlman Jun 6, 2025
0ca8228
fix(vllm-tensorizer): Remove redundant CUDA dev package installation
JustinPerlman Jun 9, 2025
1512fdf
fix(vllm-tensorizer): install setuptools_scm and cmake for vLLM build
JustinPerlman Jun 9, 2025
1ccf357
fix(vllm-tensorizer): update triton version to 2.58.0
JustinPerlman Jun 9, 2025
b32bae5
fix(vllm-tensorizer): update triton version to 3.3.1
JustinPerlman Jun 9, 2025
86181c3
fix(vllm-tensorizer): update triton version to 2.3.1
JustinPerlman Jun 9, 2025
3228eb7
fix(vllm-tensorizer): remove explicit triton versioning
JustinPerlman Jun 10, 2025
5c52d8e
feat(vllm-tensorizer): implement custom triton build and install for …
JustinPerlman Jun 10, 2025
3a78a56
fix(vllm-tensorizer): reorder build stages to resolve circular depend…
JustinPerlman Jun 10, 2025
a7e3e19
fix(vllm-tensorizer): Remove accidental backslashes
JustinPerlman Jun 10, 2025
6dcafb5
feat(vllm-tensorizer): Add MAX_JOBS unset logic; remove custom triton…
JustinPerlman Jun 12, 2025
9a1a9c1
fix(vllm-builder): Configure CUDA environment variables for vLLM comp…
JustinPerlman Jun 13, 2025
af19873
fix(vllm-tensorizer): Update vLLM commit to a newer version for PyTor…
JustinPerlman Jun 13, 2025
93db31b
fix(vllm-tensorizer): Install missing `regex` module for vLLM build m…
JustinPerlman Jun 13, 2025
f865d51
feat(vllm-tensorizer): Switch to upstream vLLM for PyTorch 2.7.0 comp…
JustinPerlman Jun 13, 2025
ea0074b
feat(vllm-tensorizer): Downgrade vLLM to v0.9.0 for PyTorch 2.7.0 com…
JustinPerlman Jun 13, 2025
7631031
fix(vllm-tensorizer): Apply CMake patch for nvToolsExt linking issue
JustinPerlman Jun 13, 2025
c0b2d0c
fix(vllm-builder): Simplify find_library call in nvToolsExt CMake patch
JustinPerlman Jun 13, 2025
17d917b
fix(vllm-tensorizer): Add missing `)`
JustinPerlman Jun 13, 2025
3bf996b
fix(vllm-tensorizer): Remove Cmake patch
JustinPerlman Jun 13, 2025
ac48891
fix(vllm-tensorizer): Update base image to CUDA 12.8.1 to resolve bui…
JustinPerlman Jun 13, 2025
ef1ebfc
fix(vllm-tensorizer): Set `MAX_JOBS` to 2 to prevent OOM during Flash…
JustinPerlman Jun 13, 2025
5bf13cc
fix(vllm-tensorizer): Increase MAX_JOBS to 8 for faster compilation, …
JustinPerlman Jun 16, 2025
c0f6a04
fix(vllm-tensorizer): Remove xformers constraint to resolve vLLM depe…
JustinPerlman Jun 16, 2025
a932752
fix(vllm-tensorizer): Remove `fschat` installation to resolve `pydant…
JustinPerlman Jun 16, 2025
b872b3e
feat(vllm-tensorizer): Upgrade to PyTorch 2.7.1; Remove commented CUD…
JustinPerlman Jun 16, 2025
c645bd5
fix(vllm-tensorizer): Correct base image tag to align PyTorch, torchv…
JustinPerlman Jun 16, 2025
9f0eaf6
fix(vllm-tensorizer): Correct base image tag to align torchaudio vers…
JustinPerlman Jun 16, 2025
ae288f5
fix(vllm-tensorizer): Use correct and existing base image tag for PyT…
JustinPerlman Jun 16, 2025
a1a8a22
fix(vllm-tensorizer): Use correct and existing base image tag for PyT…
JustinPerlman Jun 16, 2025
aea4d46
fix(vllm-tensorizer): Use 'nccl' compute base image to provide nvcc a…
JustinPerlman Jun 16, 2025
ced54a1
feat(vllm-tensorizer): Add use_existing_torch.py helper and related b…
JustinPerlman Jun 16, 2025
9effc1b
feat(vllm-tensorizer): Update base image to CUDA 12.9.0, PyTorch 2.7.…
JustinPerlman Jun 17, 2025
1057644
fix(vllm-tensorizer): Undelete `FROM scratch AS freezer`
JustinPerlman Jun 17, 2025
092946c
fix(vllm-tensorizer): Remove leftover `TRITON_COMMIT`
JustinPerlman Jun 17, 2025
1b60e61
fix(vllm-tensorizer): Improve Dockerfile ARG passing
JustinPerlman Jun 17, 2025
26479bb
style(vllm-tensorizer): Rename build stage
JustinPerlman Jun 17, 2025
ee71e13
feat(vllm-tensorizer): Install OpenAI-compatible server dependencies
JustinPerlman Jun 17, 2025
6ec3cc6
feat(vllm-tensorizer): Add `flashinfer` build, plus misc. minor changes
Eta0 Jun 17, 2025
2193567
fix(vllm-tensorizer): Use POSIX `sh`-safe string substitution
Eta0 Jun 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/vllm-tensorizer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ jobs:
with:
image-name: vllm-tensorizer
folder: vllm-tensorizer
tag-suffix: ${{ inputs.commit || '19307ba71ddeb7e1cc6aec3c1baa8b50d59c1beb'}}
tag-suffix: ${{ inputs.commit || 'b6553be1bc75f046b00046a4ad7576364d03c835'}}
build-args: |
COMMIT_HASH=${{ inputs.commit || '19307ba71ddeb7e1cc6aec3c1baa8b50d59c1beb'}}
COMMIT_HASH=${{ inputs.commit || 'b6553be1bc75f046b00046a4ad7576364d03c835'}}
TRITON_COMMIT=96316ce5
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,3 +162,6 @@ flycheck_*.el
.env*
.environment
.environment*

# JetBrains Idea files
.idea/
63 changes: 25 additions & 38 deletions vllm-tensorizer/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,66 +1,57 @@
ARG BASE_IMAGE="ghcr.io/coreweave/ml-containers/torch-extras:es-22.04-58a49a2-base-cuda12.1.1-torch2.1.2-vision0.16.2-audio2.1.2-flash_attn2.4.2"

FROM scratch as freezer
ARG BASE_IMAGE="ghcr.io/coreweave/ml-containers/torch-extras:es-compute-12.0-67208ca-nccl-cuda12.9.0-ubuntu22.04-nccl2.27.3-1-torch2.7.1-vision0.22.1-audio2.7.1-abi1"
FROM scratch AS freezer
WORKDIR /
COPY --chmod=755 freeze.sh /

FROM ${BASE_IMAGE} as builder-base

ARG MAX_JOBS=""

# Dependencies requiring NVCC are built ahead of time in a separate stage
# so that the ~2 GiB dev library installations don't have to be included
# in the final image.
RUN export \
CUDA_MAJOR_VERSION=$(echo $CUDA_VERSION | cut -d. -f1) \
CUDA_MINOR_VERSION=$(echo $CUDA_VERSION | cut -d. -f2) && \
export \
CUDA_PACKAGE_VERSION="${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}" && \
apt-get -qq update && apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_PACKAGE_VERSION} \
cuda-nvml-dev-${CUDA_PACKAGE_VERSION} \
libcurand-dev-${CUDA_PACKAGE_VERSION} \
libcublas-dev-${CUDA_PACKAGE_VERSION} \
libcusparse-dev-${CUDA_PACKAGE_VERSION} \
libcusolver-dev-${CUDA_PACKAGE_VERSION} \
cuda-nvprof-${CUDA_PACKAGE_VERSION} \
cuda-profiler-api-${CUDA_PACKAGE_VERSION} \
libaio-dev \
ninja-build && \
apt-get clean
FROM ${BASE_IMAGE} AS builder-base

ARG MAX_JOBS="16"

RUN ldconfig

RUN apt-get -qq update && \
apt-get -qq install -y --no-install-recommends \
python3-pip git ninja-build && \
python3-pip git ninja-build cmake && \
apt-get clean && \
pip3 install -U --no-cache-dir pip packaging setuptools wheel
pip3 install -U --no-cache-dir pip packaging setuptools wheel setuptools_scm regex

FROM alpine/git:2.36.3 as vllm-downloader
FROM alpine/git:2.36.3 AS vllm-downloader
WORKDIR /git
ARG COMMIT_HASH
RUN git clone --filter=blob:none --depth 1 --no-single-branch --no-checkout \
https://github.com/coreweave/vllm.git && \
https://github.com/vllm-project/vllm.git && \
cd vllm && \
git checkout "${COMMIT_HASH}" && \
git submodule update --init --recursive --jobs 8 \
--depth 1 --filter=blob:none

FROM builder-base as vllm-builder

FROM builder-base AS vllm-builder
WORKDIR /workspace

RUN --mount=type=bind,from=vllm-downloader,source=/git/vllm,target=/workspace,rw \
--mount=type=bind,from=freezer,target=/tmp/frozen,rw \
/tmp/frozen/freeze.sh torch torchaudio torchvision xformers > /tmp/frozen/constraints.txt && \
LIBRARY_PATH="/usr/local/cuda/lib64/stubs${LIBRARY_PATH:+:$LIBRARY_PATH}" \
if [ -z "$MAX_JOBS" ]; then unset MAX_JOBS; fi && \
python3 -m pip install --no-cache-dir py-cpuinfo && \
if [ -f 'use_existing_torch.py' ]; then \
python3 use_existing_torch.py; \
else \
git cat-file blob \
e489ad7a210f4234db696d1f2749d5f3662fa65b:use_existing_torch.py \
| python3 -; \
fi && \
USE_CUDNN=1 USE_CUSPARSELT=1 \
LIBRARY_PATH="/usr/local/cuda/lib64:${LIBRARY_PATH:+:$LIBRARY_PATH}" \
CUDA_TOOLKIT_ROOT_DIR="/usr/local/cuda" \
python3 -m pip wheel -w /wheels \
-v --no-cache-dir --no-build-isolation --no-deps \
-c /tmp/frozen/constraints.txt \
./

WORKDIR /wheels

FROM ${BASE_IMAGE} as base
FROM ${BASE_IMAGE} AS base

WORKDIR /workspace

Expand All @@ -69,10 +60,6 @@ RUN apt-get -qq update && apt-get install -y --no-install-recommends curl && apt
RUN --mount=type=bind,from=freezer,target=/tmp/frozen \
/tmp/frozen/freeze.sh torch torchaudio torchvision xformers > /tmp/constraints.txt

RUN python3 -m pip install --no-cache-dir \
"fschat[model_worker] == 0.2.30" "triton == 2.1.0" \
-c /tmp/constraints.txt

RUN --mount=type=bind,from=vllm-builder,source=/wheels,target=/tmp/wheels \
python3 -m pip install --no-cache-dir /tmp/wheels/*.whl -c /tmp/constraints.txt && \
rm /tmp/constraints.txt
Expand Down
Loading