-
Notifications
You must be signed in to change notification settings - Fork 5
feat(vllm-tensorizer): Upgrade vLLM version and Resolve Related Build Compatibility Issues #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit updates the default VLLM_COMMIT_HASH used in the GitHub Actions workflow for the vllm-tensorizer image. This change points the build to a more recent commit of the vLLM project.
…ch 2.7.0 compatibility
… Attention compilation
…balancing OOM risk
…A dev install; Set MAX_JOBS to 10
…ision, and torchaudio versions
…ion with PyTorch 2.7.1; uppercase all `as` keywords
…orch 2.7.1 and CUDA 12.8.1
…nd CUDA dev tools
…uild flags for PyTorch compatibility; up `MAX_JOBS` to 16
…1 variant; revert xformers constraint removal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the result of this build compare to the offical vllm image? Do we have all the same libraries etc? https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.
The CI here needs updates to build new version without manually specifying the version of base images to use.
I would also like to move to rename this to vllm-openai
to align with the vllm project upstream image.
Co-authored-by: Eta <24918963+Eta0@users.noreply.github.com>
@JustinPerlman I think the way to go here to better gather the different workflow variables in one place (vLLM version, base image version, etc.) is to move to a matrix build strategy even if the matrix only has a single element. We can work on this together, as there is a template I put together for doing this. |
@arsenetar, about this:
What would you see as not manually specifying it? Automatically pulling the most recent image from |
I would expect the CI to set the ARG for the base image, there are likely reasons where we will want to build a matrix of versions similarly to the other torch images. Minimally that is something that should be changed to set that up for the CI to define/allow passing that. With respect to upstream images breaking things and requiring updates, that is somewhat expected to happen. There is a difference between development and builds that should be considered validated released. This repo's package structure does not really help delineate the differences. |
Co-authored-by: Eta <esyra@coreweave.com>
Co-authored-by: Eta <esyra@coreweave.com>
@JustinPerlman Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/15714183890 |
Co-authored-by: Eta <esyra@coreweave.com>
This commit additionally renames VLLM_COMMIT_HASH to VLLM_COMMIT, makes git clones more efficient, and reformats some YAML lists.
As far as I can tell, the only libraries we don't install that they do are |
Out of all of that I could only think of ffmpeg as being of value if it's used by libraries for the multimodal. |
This would create a new package listing alongside |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/15718383514 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested; LGTM.
This PR focuses on updating the vLLM version being used within the
vllm-tensorizer
Docker image and resolving the various bugs and compatibility issues that arose from this upgrade.Key Changes:
ghcr.io/coreweave/ml-containers/torch-extras:es-compute-12.0-67208ca-nccl-cuda12.9.0-ubuntu22.04-nccl2.27.3-1-torch2.7.1-vision0.22.1-audio2.7.1-abi1
. This moves fromtorch:base
totorch:nccl
for CUDA development tools and aligns with PyTorch 2.7.1 and CUDA 12.9.0.FileNotFoundError: .../nvcc
andCUDA::nvToolsExt not found
by ensuring correcttorch:nccl
base image.PyTorch version expected
errors by integratinguse_existing_torch.py
to handle vLLM's PyTorch version expectations.MAX_JOBS=16
.Dockerfile
clarity by consistently uppercasingAS
keywords.