Skip to content

feat(torch): Update torch libraries to v2.5.0, bundle triton, patch TransformerEngine #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 23, 2024

Conversation

Eta0
Copy link
Collaborator

@Eta0 Eta0 commented Oct 22, 2024

PyTorch v2.5.0, triton, & Patched TransformerEngine v1.11

This change includes version updates, some patches, and bundles a source build of triton in the non-nightly ml-containers/torch images (as they were previously only bundled in the nightly ones).

PyTorch

The following PyTorch components have been updated:

In addition, the patch to fix torchaudio compilation on CUDA 12.5+ (from #82) is now obsolete in the nightly builds of torchaudio (so, for some version after v2.5.0), as the contents of the patch were included as a secondary change in pytorch/audio#3843, so this adds a check to the build process to only apply that patch in versions where it is necessary.

triton

triton is normally listed as a dependency for the x86_64 Linux releases of PyTorch on PyPI, but it is not automatically registered as a dependency in our images given the way that we build PyTorch from source. Because it would normally be seen as a required dependency, and because each PyTorch release expects a specific version of triton to be paired with it (when using features that require triton), this PR adds a source build of triton as a bundled part of all ml-containers/torch images.

The version of triton to build is pulled with the same method that torch-nightly images use: a commit hash is read from the .ci/docker/ci_commit_pins/triton.txt file in the cloned pytorch/pytorch repository, and that exact git commit of triton is used for a build. PyPI releases of triton are not used, even though they are generally compatible with stable PyTorch versions built from source, like ours, as we already had the code to build triton from a specific commit, and this could potentially be more flexible when building e.g. PyTorch release candidates.

The version of triton to be built can still be overridden by setting the BUILD_TRITON_VERSION build argument explicitly, and can be turned off completely by setting the new separate BUILD_TRITON build argument to any value other than 1.

TransformerEngine Updates

This change additionally updates TransformerEngine by a few commits, as it was previously using a commit from right before the v1.11 release. Now that v1.11 is out, it builds from that tag, and additionally includes a patch for a bug in the v1.11 release (NVIDIA/TransformerEngine#1213) that was fixed on the v1.11 branch (and in future versions) but not the v1.11 git tag as part of NVIDIA/TransformerEngine#1222.

@Eta0 Eta0 added bug Something isn't working enhancement New feature or request labels Oct 22, 2024
@Eta0 Eta0 requested a review from wbrown October 22, 2024 21:57
@Eta0 Eta0 self-assigned this Oct 22, 2024
Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-base-cuda12.4.1-ubuntu20.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-base-cuda12.4.1-ubuntu22.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-base-cuda12.6.1-ubuntu22.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-base-cuda12.2.2-ubuntu22.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-base-cuda12.2.2-ubuntu20.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-nccl-cuda12.4.1-ubuntu20.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-base-cuda12.6.1-ubuntu20.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-nccl-cuda12.4.1-ubuntu22.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-nccl-cuda12.2.2-ubuntu20.04-nccl2.21.5-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-nccl-cuda12.2.2-ubuntu22.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-nccl-cuda12.6.1-ubuntu20.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch:es-torch-updates-3a941b9-nccl-cuda12.6.1-ubuntu22.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-base-24102222-cuda12.4.1-ubuntu20.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-base-24102222-cuda12.2.2-ubuntu20.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-base-24102222-cuda12.2.2-ubuntu22.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-base-24102222-cuda12.4.1-ubuntu22.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-base-24102222-cuda12.6.1-ubuntu20.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-base-24102222-cuda12.6.1-ubuntu22.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-nccl-24102222-cuda12.4.1-ubuntu20.04-nccl2.23.4-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-nccl-24102222-cuda12.6.1-ubuntu22.04-nccl2.23.4-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-nccl-24102222-cuda12.2.2-ubuntu22.04-nccl2.23.4-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-base-cuda12.4.1-ubuntu20.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-base-cuda12.4.1-ubuntu22.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-base-cuda12.6.1-ubuntu22.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-base-cuda12.2.2-ubuntu22.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-nccl-24102222-cuda12.6.1-ubuntu20.04-nccl2.23.4-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-nccl-cuda12.4.1-ubuntu20.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-base-cuda12.6.1-ubuntu20.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368570
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-base-cuda12.2.2-ubuntu20.04-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-nccl-cuda12.4.1-ubuntu22.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-nccl-cuda12.2.2-ubuntu20.04-nccl2.21.5-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-nccl-cuda12.6.1-ubuntu20.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-nccl-cuda12.6.1-ubuntu22.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368567
Image: ghcr.io/coreweave/ml-containers/torch-extras:es-torch-updates-3a941b9-nccl-cuda12.2.2-ubuntu22.04-nccl2.23.4-1-torch2.5.0-vision0.20.0-audio2.5.0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-base-24102222-cuda12.4.1-ubuntu20.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-base-24102222-cuda12.4.1-ubuntu22.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-base-24102222-cuda12.6.1-ubuntu20.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-base-24102222-cuda12.2.2-ubuntu20.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-base-24102222-cuda12.2.2-ubuntu22.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-base-24102222-cuda12.6.1-ubuntu22.04-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-nccl-24102222-cuda12.4.1-ubuntu20.04-nccl2.23.4-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-nccl-24102222-cuda12.4.1-ubuntu22.04-nccl2.23.4-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-nccl-24102222-cuda12.6.1-ubuntu22.04-nccl2.23.4-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-nccl-24102222-cuda12.6.1-ubuntu20.04-nccl2.23.4-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch-extras:es-torch-updates-3a941b9-nccl-24102222-cuda12.2.2-ubuntu22.04-nccl2.23.4-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link

@Eta0 Build complete, success: https://github.com/coreweave/ml-containers/actions/runs/11469368645
Image: ghcr.io/coreweave/ml-containers/nightly-torch:es-torch-updates-3a941b9-nccl-24102222-cuda12.2.2-ubuntu20.04-nccl2.21.5-1-torch2.6.0a0-vision0.20.0a0-audio2.5.0a0

Copy link
Collaborator

@wbrown wbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@wbrown wbrown merged commit f575d1b into main Oct 23, 2024
102 checks passed
@wbrown wbrown deleted the es/torch-updates branch October 23, 2024 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants