Skip to content

feat!: Update to new runners, update PyTorch, add ARM builds, add Blackwell support, add sglang image #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 95 commits into from
Feb 22, 2025
Merged
Show file tree
Hide file tree
Changes from 93 commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
9a5990d
feat(torch): Update PyTorch to v2.5.1 & update CUDA 12.6
Eta0 Nov 4, 2024
eec6daa
feat(torch): Update `torch:nccl` base images
Eta0 Nov 4, 2024
e411a3d
ci(torch): Change `torch:nccl` matrix build layout
Eta0 Nov 5, 2024
8612038
ci: Update action versions
Eta0 Nov 6, 2024
f58c927
ci: Use remote BuildKit worker & new runners
Eta0 Nov 6, 2024
a3f1d78
ci(torch-nightly): Only filter specific fields from configs' `include`
Eta0 Nov 6, 2024
e6bb688
ci(torch-nightly): Change `del()` syntax in `yq` filter
Eta0 Nov 6, 2024
4fe66e2
ci(torch-nightly): Treat `include` as an array in `yq` filter
Eta0 Nov 6, 2024
1c31940
ci(torch-nightly): Exclude `ubuntu20.04` from `torch-nightly` builds
Eta0 Nov 6, 2024
092a837
ci(torch-nightly): Filter out entire `include` key for `torch:base`
Eta0 Nov 6, 2024
3448307
ci: Build for multiple architectures
Eta0 Nov 7, 2024
3b6b17d
ci: Build only for `linux/amd64`
Eta0 Nov 7, 2024
35c959b
fix(torch): Include a post-v2.5.1 bugfix patch when building PyTorch
Eta0 Nov 13, 2024
0c5c0f0
Merge branch 'es/torch-updates' into es/actions
Eta0 Nov 18, 2024
cdafd6b
feat(torch): Parameterize `compiler_wrapper.f95`
Eta0 Nov 18, 2024
e0bd93a
fix(torch): Enable preprocessor when compiling `compiler_wrapper.f95`
Eta0 Nov 18, 2024
074edd5
build(torch): Make the build process less architecture-dependent
Eta0 Nov 18, 2024
dc97cd6
fix(torch): Use `cmake` after installing it instead of before
Eta0 Nov 18, 2024
85f3b0e
build(torch): Allow customizing `-march` with `--build-arg`s
Eta0 Nov 18, 2024
68121aa
build(torch): Allow customizing `MAX_JOBS` as a build arg
Eta0 Nov 18, 2024
d35022b
build(torch): Don't apply custom `MAX_JOBS` to `flash-attn` build
Eta0 Nov 18, 2024
8305f7f
build(torch): Line-buffer `grep` output when building PyTorch
Eta0 Nov 18, 2024
daa5a4b
build(torch): Filter more output when building PyTorch
Eta0 Nov 18, 2024
c940ad4
build(torch): Allow customizing TransformerEngine build arches
Eta0 Nov 18, 2024
863fca5
feat(torch): Add `flash-attn` 3 beta
Eta0 Nov 18, 2024
b4ce2da
build(torch): Filter even more output when building PyTorch
Eta0 Nov 18, 2024
87c22ff
build(torch-extras): Configure `compiler_wrapper.f95` parameters
Eta0 Nov 18, 2024
9cb44f4
build(torch-extras): Allow overriding `MAX_JOBS` and `NVCC_APPEND_FLAGS`
Eta0 Nov 18, 2024
4a6f2a7
fix(torch-extras): Add missing `+` in parameter expansion
Eta0 Nov 18, 2024
1dfcbc1
feat(torch-extras): Build DeepSpeed-Kernels
Eta0 Nov 18, 2024
8a44533
fix(torch-extras): Use separate build argument for DS-Kernels arches
Eta0 Nov 18, 2024
a3e444d
build(torch-extras): Install `py-cpuinfo` before building DeepSpeed
Eta0 Nov 18, 2024
cc76549
build(torch): Allow setting `MAX_JOBS` when building Triton
Eta0 Nov 18, 2024
a444c44
build(torch): Allow configuring `NVCC_APPEND_FLAGS` as a build argument
Eta0 Nov 18, 2024
a8bbb8a
build(torch-extras): Don't install `cuda-nvprof` before building
Eta0 Nov 18, 2024
de9a3de
build(torch): Conditionally enable `USE_PRIORITIZED_TEXT_FOR_LD`
Eta0 Nov 18, 2024
a889d61
build(torch): Don't use `lld` on `aarch64`
Eta0 Nov 18, 2024
86c27ff
build(torch): Set `-eo pipefail` for the PyTorch build command
Eta0 Nov 18, 2024
3b14ffb
feat(torch): Update TransformerEngine to v1.12
Eta0 Nov 19, 2024
bb07aed
build(torch): Use `ccache` more often
Eta0 Nov 19, 2024
65433ac
feat(torch): Compile `flash-attn` 3 as a separate package
Eta0 Nov 19, 2024
69be6f1
build(torch): Use tabs for heredoc indentation
Eta0 Nov 19, 2024
ecfac12
build(torch): Invoke `fa-build.sh` correctly
Eta0 Nov 19, 2024
a7d7e04
build(torch): Use tabs for other heredocs
Eta0 Nov 19, 2024
4e48edb
build(torch): Use the resulting artifact from `flash-attn-3-builder`
Eta0 Nov 19, 2024
ac1610a
build(torch): Filter more lines while building `flash-attn`
Eta0 Nov 19, 2024
27f1964
build(torch): Build `flash-attn` and `flash-attn` 3 in sequence
Eta0 Nov 19, 2024
83367bc
fix(torch): Fix typo in bind mount's `source=` parameter
Eta0 Nov 19, 2024
915e47e
fix(torch): Use `exit 1` instead of `exit -1`
Eta0 Nov 19, 2024
7162b44
fix(torch): Broaden criteria to apply PyTorch patch
Eta0 Nov 26, 2024
d6e73e7
feat(torch): Force compilation for compute capability 9.0a
Eta0 Nov 26, 2024
b50c6f2
fix(torch): Restore original criteria to apply PyTorch v2.5.1 patch
Eta0 Nov 26, 2024
626b44d
feat(torch): Specify string preprocessor definitions correctly
Eta0 Nov 26, 2024
762021f
fix(torch): Install `pybind11` before attempting to build Triton
Eta0 Nov 26, 2024
04cfc69
build(torch): Add missing `$` in `MAX_JOBS` default for TE
Eta0 Nov 26, 2024
e6dac89
ci(torch): Drop CUDA 12.2.2 build
Eta0 Nov 26, 2024
56f06ed
feat(torch-extras): Update Apex to `a1df804`
Eta0 Nov 26, 2024
bcd5fab
feat(torch): Update `torch:nccl` base images for HPC-X v2.21
Eta0 Dec 3, 2024
c14235b
ci: Update to newer self-hosted runners
Eta0 Dec 3, 2024
4f64cf1
ci(torch): Update CUDA 12.6 builds to 12.6.3; update `torch:nccl` bases
Eta0 Dec 3, 2024
0dcf27d
fix(torch): Edit `flash-attn` 3 installation for compatibility with TE
Eta0 Dec 3, 2024
fd6df40
fix(torch): Add redundant interpreter specification for compatibility
Eta0 Dec 16, 2024
ebaf5aa
feat(torch): Update LLVM components, including `libomp` runtime library
Eta0 Dec 16, 2024
ed93c44
ci: Remove deprecated BuildKit runner endpoint
Eta0 Dec 16, 2024
2da2fb1
feat(torch): Upgrade TransformerEngine to v1.13
Eta0 Dec 24, 2024
09ed200
fix(torch): Add `-ffree-line-length-512` to `gfortran` invocations
Eta0 Dec 24, 2024
8f95f59
feat(torch): Update `flash-attention` 2 & 3 to v2.7.2
Eta0 Jan 9, 2025
8e29075
ci(torch): Drop Ubuntu 20.04 CI builds
Eta0 Jan 10, 2025
459aa23
ci: Re-enable multi-arch builds
Eta0 Jan 29, 2025
cd1019b
build(torch): Add new build targets with CUDA 12.8.0
Eta0 Jan 29, 2025
643b362
ci(torch): Update `nccl-tests` commit hash
Eta0 Jan 29, 2025
b58974c
build(torch): Filter `compute_100` build on older CUDA versions
Eta0 Jan 29, 2025
7095c59
build(torch): Switch `NVCC_APPEND_FLAGS` to not be an `ENV` directive
Eta0 Jan 29, 2025
440d844
ci: Build only for `linux/amd64` again
Eta0 Jan 30, 2025
6bc6fb6
feat(torch): Build with PyTorch v2.6.0
Eta0 Jan 30, 2025
ba41ff9
feat(torch): Build with `flash-attn` v2.7.4.post1
Eta0 Jan 30, 2025
25e8a9e
build(torch): Build both CXX11 ABI variants
Eta0 Jan 30, 2025
77574c2
ci(torch): Remove parameterization of `TORCH_CUDA_ARCH_LIST`
Eta0 Jan 30, 2025
486738e
build(torch): Downgrade `flash-attn` 3 to the 2.7.2.post1 tag
Eta0 Jan 30, 2025
2da31d9
ci: Re-enable ARM64 builds again
Eta0 Feb 1, 2025
ac7f89d
ci(torch): Increase `torch` image build job timeout
Eta0 Feb 1, 2025
f9ffd6f
ci(torch): Increase all job timeouts
Eta0 Feb 1, 2025
fb567b8
ci(torch): Remove `torch`-specific job timeout override
Eta0 Feb 1, 2025
0bd8996
build(torch-extras): Specify DeepSpeed build flags better
Eta0 Feb 3, 2025
386fabe
build(torch-extras): Remove `DS_ACCELERATOR` specification
Eta0 Feb 3, 2025
45dd5a0
build(torch): Enable less-hacky 10.0 arch support in PyTorch
Eta0 Feb 7, 2025
90d178b
feat(sglang): Add `sglang` image
Eta0 Feb 7, 2025
855d2f3
build(sglang): Use `USE_CUDNN` and `USE_CUSPARSELT` flags in vLLM build
Eta0 Feb 7, 2025
dc70ca8
fix(sglang): Remove extraneous `rmdir` build step
Eta0 Feb 7, 2025
f113d38
fix(sglang): Skip `apt` prompts
Eta0 Feb 7, 2025
f63ddbe
ci(torch-nightly): Update runner image version
Eta0 Feb 14, 2025
868a611
ci: Parameterize build platforms
Eta0 Feb 14, 2025
0e44116
fix(torch): Filter 10.0 arch builds on unsupported CUDA versions again
Eta0 Feb 14, 2025
2f37df6
fix(torch): Filter 10.0 arch builds in TransformerEngine build
Eta0 Feb 14, 2025
68fbfd1
ci(torch): Rework logic for passing various build arguments
Eta0 Feb 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .github/configurations/torch-base.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
cuda: [ 12.6.1, 12.4.1, 12.2.2 ]
os: [ ubuntu22.04, ubuntu20.04 ]
cuda: [ 12.8.0, 12.6.3, 12.4.1 ]
os: [ ubuntu22.04 ]
abi: [ 1, 0 ]
include:
- torch: 2.5.0
vision: 0.20.0
audio: 2.5.0
- torch: 2.6.0
vision: 0.21.0
audio: 2.6.0
44 changes: 8 additions & 36 deletions .github/configurations/torch-nccl.yml
Original file line number Diff line number Diff line change
@@ -1,37 +1,9 @@
image:
# Ubuntu 22.04
- cuda: 12.6.1
cudnn: cudnn
os: ubuntu22.04
nccl: 2.23.4-1
nccl-tests-hash: 2ff05b2
- cuda: 12.4.1
cudnn: cudnn
os: ubuntu22.04
nccl: 2.23.4-1
nccl-tests-hash: 2ff05b2
- cuda: 12.2.2
cudnn: cudnn8
os: ubuntu22.04
nccl: 2.23.4-1
nccl-tests-hash: 2ff05b2
# Ubuntu 20.04
- cuda: 12.6.1
cudnn: cudnn
os: ubuntu20.04
nccl: 2.23.4-1
nccl-tests-hash: 2ff05b2
- cuda: 12.4.1
cudnn: cudnn
os: ubuntu20.04
nccl: 2.23.4-1
nccl-tests-hash: 2ff05b2
- cuda: 12.2.2
cudnn: cudnn8
os: ubuntu20.04
nccl: 2.21.5-1
nccl-tests-hash: 2ff05b2
cuda: [ 12.8.0, 12.6.3, 12.4.1 ]
os: [ ubuntu22.04 ]
abi: [ 1, 0 ]
include:
- torch: 2.5.0
vision: 0.20.0
audio: 2.5.0
- torch: 2.6.0
vision: 0.21.0
audio: 2.6.0
nccl: 2.25.1-1
nccl-tests-hash: 57fa979
63 changes: 48 additions & 15 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ on:
required: false
description: "Optional sub-key to append to the image name for build layer caching"
type: string
platforms:
required: false
description: "Platforms for which to build (default: linux/amd64,linux/arm64)"
type: string
default: linux/amd64,linux/arm64
outputs:
outcome:
description: "The outcome of the build"
Expand All @@ -33,26 +38,42 @@ on:
jobs:
build:
name: Build Images
runs-on: [ self-hosted, Linux ]
runs-on: [ cw ]
container: 'ghcr.io/coreweave/github-actions-images/github-base-runner:v1.9.0'
timeout-minutes: 960
defaults:
run:
shell: bash
outputs:
outcome: ${{ steps.docker-build.outcome }}
tags: ${{ steps.meta.outputs.tags }}
version: ${{ steps.meta.outputs.version }}
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2.2.1
- name: Login to GitHub container registry
uses: docker/login-action@v2.2.0
- uses: actions/checkout@v4
- name: Fetch BuildKit Client Certs
uses: dopplerhq/secrets-fetch-action@v1.2.0
id: client-certs
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Login to DockerHub container registry
uses: docker/login-action@v2.2.0
doppler-token: ${{ secrets.ORG_BUILDKIT_CLIENT_TOKEN }}
doppler-project: ${{ secrets.BUILDKIT_CONSUMER_DOPPLER_PROJECT }}
doppler-config: prod
inject-env-vars: false
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3.7.1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
driver: remote
endpoint: ${{ secrets.BUILDKIT_CONSUMER_AMD64_ENDPOINT }}
platforms: linux/amd64
append: |
- endpoint: ${{ secrets.BUILDKIT_CONSUMER_ARM64_ENDPOINT }}
platforms: linux/arm64
env:
BUILDER_NODE_0_AUTH_TLS_CACERT: ${{ steps.client-certs.outputs.TLS_CACERT }}
BUILDER_NODE_0_AUTH_TLS_CERT: ${{ steps.client-certs.outputs.TLS_CERT }}
BUILDER_NODE_0_AUTH_TLS_KEY: ${{ steps.client-certs.outputs.TLS_KEY }}
BUILDER_NODE_1_AUTH_TLS_CACERT: ${{ steps.client-certs.outputs.TLS_CACERT }}
BUILDER_NODE_1_AUTH_TLS_CERT: ${{ steps.client-certs.outputs.TLS_CERT }}
BUILDER_NODE_1_AUTH_TLS_KEY: ${{ steps.client-certs.outputs.TLS_KEY }}
- name: Get base registry
run: |
echo "REGISTRY=ghcr.io/${GITHUB_REPOSITORY,,}" >> $GITHUB_ENV
Expand All @@ -70,14 +91,21 @@ jobs:
echo "CACHE_KEY=${{ inputs.image-name }}-${{ inputs.cache-key }}" >> $GITHUB_ENV
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4.1.1
uses: docker/metadata-action@v5.5.1
with:
images: ${{ env.REGISTRY }}/${{ inputs.image-name }}
tags: |
type=sha,prefix=${{ env.TAG_PREFIX }},suffix=${{ env.TAG_SUFFIX }},format=short
- name: Initialize registry credentials file
env:
USER: ${{ github.actor }}
PASS: ${{ secrets.GITHUB_TOKEN }}
run: |
jq -n '.auths."ghcr.io" = { username: env.USER, password: env.PASS }' \
| install -m400 /dev/stdin ~/.docker/config.json
- name: Build and push Docker image
id: docker-build
uses: docker/build-push-action@v3.2.0
uses: docker/build-push-action@v6.9.0
with:
context: ${{ inputs.folder }}
build-args: |-
Expand All @@ -87,6 +115,11 @@ jobs:
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=registry,ref=${{ env.REGISTRY }}/buildcache:${{ env.CACHE_KEY || inputs.image-name }}
cache-to: type=registry,ref=${{ env.REGISTRY }}/buildcache:${{ env.CACHE_KEY || inputs.image-name }},mode=max
platforms: ${{ inputs.platforms }}
- name: Clear registry credentials
if: always()
run: |
rm -f ~/.docker/config.json && [ ! -e ~/.docker/config.json ]
- uses: 8BitJonny/gh-get-current-pr@2.1.3
id: PR
with:
Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/read-configuration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,16 @@ on:
jobs:
read-file:
name: Read Configuration File
runs-on: ["self-hosted", "Linux"]
runs-on: [ cw ]
container: 'ghcr.io/coreweave/github-actions-images/github-base-runner:v1.4.0'
defaults:
run:
shell: bash
permissions: {}
outputs:
config: ${{ steps.read.outputs.contents }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Read configuration
id: read
env:
Expand Down
30 changes: 30 additions & 0 deletions .github/workflows/sglang.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
on:
workflow_dispatch:
inputs:
tag:
description: 'Tag for the build'
required: true
base-image:
description: 'Base image from which to build'
required: true
builder-image:
description: 'Image to use to compile wheels, if different from the base image'
required: false
push:
paths:
- "sglang/**"
- ".github/workflows/sglang.yml"
- ".github/workflows/build.yml"


jobs:
build:
uses: ./.github/workflows/build.yml
secrets: inherit
with:
image-name: sglang
folder: sglang
tag-suffix: ${{ inputs.tag || '386fabe-nccl-cuda12.8.0-ubuntu22.04-nccl2.25.1-1-torch2.6.0-vision0.21.0-audio2.6.0-abi1' }}
build-args: |
BASE_IMAGE=${{ inputs.base-image || 'ghcr.io/coreweave/ml-containers/torch-extras:es-actions-386fabe-nccl-cuda12.8.0-ubuntu22.04-nccl2.25.1-1-torch2.6.0-vision0.21.0-audio2.6.0-abi1'}}
${{ inputs.base-image && 'BASE_IMAGE=' }}${{ inputs.base-image}}
3 changes: 2 additions & 1 deletion .github/workflows/torch-base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,12 @@ jobs:
secrets: inherit
with:
image-name: ${{ inputs.image-name }}
tag: ${{ format('{0}-{1}', format('base-cuda{0}-{1}', matrix.cuda, matrix.os), inputs.image-tag-suffix || format('torch{0}-vision{1}-audio{2}', matrix.torch, matrix.vision, matrix.audio)) }}
tag: ${{ format('{0}-{1}', format('base-cuda{0}-{1}', matrix.cuda, matrix.os), inputs.image-tag-suffix || format('torch{0}-vision{1}-audio{2}-abi{3}', matrix.torch, matrix.vision, matrix.audio, matrix.abi)) }}
builder-base-image: nvidia/cuda:${{ matrix.cuda }}-devel-${{ matrix.os }}
base-image: nvidia/cuda:${{ matrix.cuda }}-base-${{ matrix.os }}
torch-version: ${{ matrix.torch }}
torchvision-version: ${{ matrix.vision }}
torchaudio-version: ${{ matrix.audio }}
cxx11-abi: ${{ matrix.abi }}
cache-key: base-cuda${{ matrix.cuda }}-${{ matrix.os }}
build-extras: true
8 changes: 6 additions & 2 deletions .github/workflows/torch-extras.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,17 @@ jobs:
get-required-bases:
name: Get Latest Required Base Images
if: inputs.skip-bases-check != true
runs-on: ["self-hosted", "Linux"]
runs-on: [ cw ]
container: 'ghcr.io/coreweave/github-actions-images/github-base-runner:v1.4.0'
defaults:
run:
shell: bash
permissions:
packages: read
outputs:
bases-list: ${{ steps.choose-bases.outputs.list }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check if torch-extras needs to be rebuilt from previous bases
Expand Down
9 changes: 5 additions & 4 deletions .github/workflows/torch-nccl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,12 @@ jobs:
secrets: inherit
with:
image-name: ${{ inputs.image-name }}
tag: ${{ format('{0}-{1}', format('nccl-cuda{0}-{1}-nccl{2}', matrix.image.cuda, matrix.image.os, matrix.image.nccl), inputs.image-tag-suffix || format('torch{0}-vision{1}-audio{2}', matrix.torch, matrix.vision, matrix.audio)) }}
builder-base-image: ghcr.io/coreweave/nccl-tests:${{ matrix.image.cuda }}-${{ matrix.image.cudnn }}-devel-${{ matrix.image.os }}-nccl${{ matrix.image.nccl }}-${{ matrix.image.nccl-tests-hash }}
base-image: ghcr.io/coreweave/nccl-tests:${{ matrix.image.cuda }}-${{ matrix.image.cudnn }}-devel-${{ matrix.image.os }}-nccl${{ matrix.image.nccl }}-${{ matrix.image.nccl-tests-hash }}
tag: ${{ format('{0}-{1}', format('nccl-cuda{0}-{1}-nccl{2}', matrix.cuda, matrix.os, matrix.nccl), inputs.image-tag-suffix || format('torch{0}-vision{1}-audio{2}-abi{3}', matrix.torch, matrix.vision, matrix.audio, matrix.abi)) }}
builder-base-image: ghcr.io/coreweave/nccl-tests:${{ matrix.cuda }}-devel-${{ matrix.os }}-nccl${{ matrix.nccl }}-${{ matrix.nccl-tests-hash }}
base-image: ghcr.io/coreweave/nccl-tests:${{ matrix.cuda }}-devel-${{ matrix.os }}-nccl${{ matrix.nccl }}-${{ matrix.nccl-tests-hash }}
torch-version: ${{ matrix.torch }}
torchvision-version: ${{ matrix.vision }}
torchaudio-version: ${{ matrix.audio }}
cache-key: nccl-cuda${{ matrix.image.cuda }}-${{ matrix.image.os }}
cxx11-abi: ${{ matrix.abi }}
cache-key: nccl-cuda${{ matrix.cuda }}-${{ matrix.os }}
build-extras: true
18 changes: 11 additions & 7 deletions .github/workflows/torch-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,11 @@ jobs:
get-nightly-info:
name:
Get Nightly Info
runs-on: [ self-hosted, Linux ]
runs-on: [ cw ]
container: 'ghcr.io/coreweave/github-actions-images/github-base-runner:v1.9.0'
defaults:
run:
shell: bash
outputs:
pytorch-commit: ${{ steps.get-hash.outputs.pytorch-commit }}
triton-commit: ${{ steps.get-hash.outputs.triton-commit }}
Expand Down Expand Up @@ -89,13 +93,13 @@ jobs:
uses: ./.github/workflows/read-configuration.yml
with:
path: ./.github/configurations/torch-base.yml
filter: del(.include)
filter: 'del(.include) | .exclude |= . + [{"os": "ubuntu20.04"}]'
get-nccl-config:
name: Get torch:nccl Config
uses: ./.github/workflows/read-configuration.yml
with:
path: ./.github/configurations/torch-nccl.yml
filter: del(.include)
filter: 'del( .include[] | ( .torch, .vision, .audio ) ) | .exclude |= . + [{"os": "ubuntu20.04"}]'

build-base:
name: Build Nightly torch:base
Expand Down Expand Up @@ -130,12 +134,12 @@ jobs:
secrets: inherit
with:
image-name: nightly-torch
tag: ${{ format('nccl-{0}-cuda{1}-{2}-nccl{3}-{4}', needs.get-nightly-info.outputs.date, matrix.image.cuda, matrix.image.os, matrix.image.nccl, needs.get-nightly-info.outputs.version-string ) }}
builder-base-image: ghcr.io/coreweave/nccl-tests:${{ matrix.image.cuda }}-${{ matrix.image.cudnn }}-devel-${{ matrix.image.os }}-nccl${{ matrix.image.nccl }}-${{ matrix.image.nccl-tests-hash }}
base-image: ghcr.io/coreweave/nccl-tests:${{ matrix.image.cuda }}-${{ matrix.image.cudnn }}-devel-${{ matrix.image.os }}-nccl${{ matrix.image.nccl }}-${{ matrix.image.nccl-tests-hash }}
tag: ${{ format('nccl-{0}-cuda{1}-{2}-nccl{3}-{4}', needs.get-nightly-info.outputs.date, matrix.cuda, matrix.os, matrix.nccl, needs.get-nightly-info.outputs.version-string ) }}
builder-base-image: ghcr.io/coreweave/nccl-tests:${{ matrix.cuda }}-devel-${{ matrix.os }}-nccl${{ matrix.nccl }}-${{ matrix.nccl-tests-hash }}
base-image: ghcr.io/coreweave/nccl-tests:${{ matrix.cuda }}-devel-${{ matrix.os }}-nccl${{ matrix.nccl }}-${{ matrix.nccl-tests-hash }}
torch-version: ${{ needs.get-nightly-info.outputs.pytorch-commit }}
torchvision-version: ${{ needs.get-nightly-info.outputs.torchvision-commit }}
torchaudio-version: ${{ needs.get-nightly-info.outputs.torchaudio-commit }}
triton-version: ${{ needs.get-nightly-info.outputs.triton-commit }}
cache-key: nccl-cuda${{ matrix.image.cuda }}-${{ matrix.image.os }}
cache-key: nccl-cuda${{ matrix.cuda }}-${{ matrix.os }}
build-extras: true
11 changes: 5 additions & 6 deletions .github/workflows/torch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,9 @@ on:
triton-version:
required: false
type: string
cuda-arch-support:
cxx11-abi:
required: false
type: string
default: "7.0 7.5 8.0 8.6 8.9 9.0+PTX"
image-name:
required: false
type: string
Expand Down Expand Up @@ -67,11 +66,10 @@ on:
required: false
description: "Tagged version number from openai/triton to build"
type: string
cuda-arch-support:
cxx11-abi:
required: false
description: "Space-separated list of CUDA architectures to support"
description: "Build with the CXX11 ABI (1 = enable, 0 = disable)"
type: string
default: "7.0 7.5 8.0 8.6 8.9 9.0+PTX"
image-name:
required: false
description: "Custom name under which to publish the resulting container"
Expand Down Expand Up @@ -99,7 +97,8 @@ jobs:
BUILD_TORCH_VERSION=${{ inputs.torch-version }}
BUILD_TORCH_VISION_VERSION=${{ inputs.torchvision-version }}
BUILD_TORCH_AUDIO_VERSION=${{ inputs.torchaudio-version }}
${{ inputs.cuda-arch-support && format('BUILD_TORCH_CUDA_ARCH_LIST={0}', inputs.cuda-arch-support) || '' }}
BUILD_TORCH_CUDA_ARCH_LIST=7.0 7.5 8.0 8.6 8.9 9.0+PTX
${{ inputs.cxx11-abi && format('BUILD_CXX11_ABI={0}', inputs.cxx11-abi) || '' }}
${{ inputs.triton-version && format('BUILD_TRITON_VERSION={0}', inputs.triton-version) || '' }}
build-extras:
name: Build torch-extras
Expand Down
28 changes: 28 additions & 0 deletions sglang/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# syntax=docker/dockerfile:1.2
ARG BASE_IMAGE
ARG BUILDER_IMAGE="${BASE_IMAGE}"

FROM ${BUILDER_IMAGE} AS builder

ARG BUILD_TORCH_CUDA_ARCH_LIST='8.0 8.6 8.9 9.0 10.0+PTX'

ARG FLASHINFER_COMMIT='c04755e21f4d6fb7813c703f2b00a7ef012be9b8'
ARG CUTLASS_COMMIT='b78588d1630aa6643bf021613717bafb705df4ef'
ARG VLLM_COMMIT='5095e966069b9e65b7c4c63427e06cebacaad0a0'
ARG SGLANG_COMMIT='4b6f62e2bc52a528551e9a21e7b0a4945c6115bb'
ARG DECORD_COMMIT='d2e56190286ae394032a8141885f76d5372bd44b'
# Building Triton is not currently enabled,
# but this is the commit that would be used if it were
ARG TRITON_COMMIT='1e0e51c4aeb3e1beea000da5d0e494f8b9ac40dd'

WORKDIR /build
COPY build.bash /build/
RUN mkdir /wheels && \
bash build.bash -a "${BUILD_TORCH_CUDA_ARCH_LIST}" && \
rm -rf /build/*
COPY install.bash /wheels/

FROM ${BASE_IMAGE}
RUN --mount=type=bind,from=builder,source=/wheels,target=/wheels \
cd /wheels && \
bash install.bash
Loading
Loading