-
Notifications
You must be signed in to change notification settings - Fork 5
feat(torch): Update torch
libraries to v2.5.0, bundle triton
, patch TransformerEngine
#85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
PyTorch v2.5.0,
triton
, & Patched TransformerEngine v1.11This change includes version updates, some patches, and bundles a source build of
triton
in the non-nightlyml-containers/torch
images (as they were previously only bundled in the nightly ones).PyTorch
The following PyTorch components have been updated:
torch
v2.4.1 → v2.5.0torchvision
v0.19.1 → v0.20.0torchaudio
v2.4.1 → v2.5.0In addition, the patch to fix
torchaudio
compilation on CUDA 12.5+ (from #82) is now obsolete in the nightly builds oftorchaudio
(so, for some version after v2.5.0), as the contents of the patch were included as a secondary change in pytorch/audio#3843, so this adds a check to the build process to only apply that patch in versions where it is necessary.triton
triton
is normally listed as a dependency for the x86_64 Linux releases of PyTorch on PyPI, but it is not automatically registered as a dependency in our images given the way that we build PyTorch from source. Because it would normally be seen as a required dependency, and because each PyTorch release expects a specific version oftriton
to be paired with it (when using features that requiretriton
), this PR adds a source build oftriton
as a bundled part of allml-containers/torch
images.The version of
triton
to build is pulled with the same method thattorch-nightly
images use: a commit hash is read from the.ci/docker/ci_commit_pins/triton.txt
file in the clonedpytorch/pytorch
repository, and that exact git commit oftriton
is used for a build. PyPI releases oftriton
are not used, even though they are generally compatible with stable PyTorch versions built from source, like ours, as we already had the code to buildtriton
from a specific commit, and this could potentially be more flexible when building e.g. PyTorch release candidates.The version of
triton
to be built can still be overridden by setting theBUILD_TRITON_VERSION
build argument explicitly, and can be turned off completely by setting the new separateBUILD_TRITON
build argument to any value other than1
.TransformerEngine Updates
This change additionally updates TransformerEngine by a few commits, as it was previously using a commit from right before the v1.11 release. Now that v1.11 is out, it builds from that tag, and additionally includes a patch for a bug in the v1.11 release (NVIDIA/TransformerEngine#1213) that was fixed on the v1.11 branch (and in future versions) but not the v1.11 git tag as part of NVIDIA/TransformerEngine#1222.