feat(torch): Support compute capability 12.0 #99

Eta0 · 2025-06-18T19:34:05Z

Compute Capability 12.0 + CUDA 12.9 Fixes

This change adds compute capability 12.0 (sm_120) to the list of compiled architectures for libraries in the ml-containers/torch and ml-containers/torch-extras images. It additionally fixes an issue with CUDA 12.9 support when compiling PyTorch extension modules by cherry-picking a patch from the main branch of the PyTorch repo into our torch build, and gets torch to use the system NVTX library when available (i.e. on CUDA 12.9).

xformers in ml-containers/torch-extras has also been updated to its v0.0.30 release, offering better compatibility with newer GPU architectures, PyTorch versions, and with vLLM.

Finally, this update drops compute capability 7.0 (e.g. V100), leaving 7.5 (e.g. Quadro RTX 4000/5000) as the lowest supported architecture.

sangstar

LGTM!

torch/Dockerfile

Eta0 added 8 commits June 10, 2025 12:57

feat(torch): Support compute capability 12.0

41ddc1d

build(torch)!: Drop compute capability 7.0

531a3c5

build(torch)!: Drop compute capability 12.0a (but not 12.0)

18cf94b

ci(torch): Set fail-fast: false on torch builds

2bf6b9c

build(torch): Set USE_SYSTEM_NVTX=1 for PyTorch + extension builds

8feb1c4

build(torch): Cherry-pick patch to fix PyTorch extensions w/ CUDA 12.9

541f1e9

feat(torch-extras): Update xformers to v0.0.30

67208ca

build(torch): Only set USE_SYSTEM_NVTX=1 for PyTorch with CUDA 12.9

0d10e96

Eta0 requested review from wbrown and sangstar June 18, 2025 19:34

Eta0 self-assigned this Jun 18, 2025

Eta0 added the enhancement New feature or request label Jun 18, 2025

sangstar approved these changes Jun 18, 2025

View reviewed changes

torch/Dockerfile Show resolved Hide resolved

torch/Dockerfile Show resolved Hide resolved

Eta0 merged commit f359507 into main Jun 18, 2025
83 of 92 checks passed

Eta0 deleted the es/compute-12.0 branch June 18, 2025 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(torch): Support compute capability 12.0 #99

feat(torch): Support compute capability 12.0 #99

Uh oh!

Eta0 commented Jun 18, 2025

Uh oh!

sangstar left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(torch): Support compute capability 12.0 #99

feat(torch): Support compute capability 12.0 #99

Uh oh!

Conversation

Eta0 commented Jun 18, 2025

Compute Capability 12.0 + CUDA 12.9 Fixes

Uh oh!

sangstar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!