Skip to content

feat!: Update to new runners, update PyTorch, add ARM builds, add Blackwell support, add sglang image #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 95 commits into from
Feb 22, 2025

Conversation

Eta0
Copy link
Collaborator

@Eta0 Eta0 commented Feb 14, 2025

Many updates (CI, building for more architectures, PyTorch libraries, etc.)

This PR is to merge an overly-long-running branch back to main comprising most updates to PyTorch and the CI system from the last few months. Some highlights include:

  • Switched to use newer GHA CI runners
  • Added support for multi-arch amd64+arm64 image builds (now the default)
    • The ml-containers/torch and ml-containers/torch-extras build processes were adapted to support arm64, to match
  • Dropped Ubuntu 20.04 builds
    • Python 3.8, included in Ubuntu 20.04, is now EOL and unsupported by PyTorch
  • Added CUDA 12.8 builds for PyTorch
  • Added PyTorch builds with compute capabilities 9.0a, 10.0, and 10.0a enabled (on applicable CUDA versions)
  • Added PyTorch builds with the CXX11 ABI disabled, in addition to the ones with it enabled (previous default)
    • This change will likely be reverted later, as PyTorch is planning to migrate to enabling the CXX11 ABI for their own builds, which will mean that keeping it enabled will lead to the best compatibility
  • Updated PyTorch (and friends) to v2.6.0
  • Updated many ml-containers/torch and ml-containers/torch-extras bundled libraries
  • Added Flash Attention v3 to the ml-containers/torch image
  • Added an sglang image

There were also various bugfixes, improvements, and customizability enhancements added along the way noted in commit messages.

Eta0 added 30 commits November 4, 2024 14:48
Eta0 added 6 commits February 7, 2025 01:14
The previous method didn't work when 10.0 was included in the
BUILD_TORCH_CUDA_ARCH_LIST build argument, so this uses shell
parameter expansion hackery to get around that.

This also keeps the previous logic, but switches it
to force sm_100a builds on supported CUDA versions.
@Eta0 Eta0 added bug Something isn't working enhancement New feature or request labels Feb 14, 2025
@Eta0 Eta0 requested a review from wbrown February 14, 2025 21:29
@Eta0 Eta0 self-assigned this Feb 14, 2025
Copy link
Collaborator

@wbrown wbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heckin' big. One question, otherwsie looks great to me.

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

5 similar comments
Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351603
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

1 similar comment
Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351603
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

3 similar comments
Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/13338351608
Image: ``

@wbrown wbrown merged commit d6cea4b into main Feb 22, 2025
40 of 61 checks passed
@wbrown wbrown deleted the es/actions branch February 22, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants