Skip to content

adding unsupported NVIDIA Maxwell/Pascal/Volta architectures when using CMake >=3.23.0 with CUDA13 #1779

@studioego

Description

@studioego

System Info

platform: x86_64, Ubuntu 24.04

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.3 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

Python version: Python 3.13.3
CPU: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
GPU: Hardware: NVIDIA RTX5060
CUDA Toolkit: 13.0.88
CMake: 3.28.3
GCC: 13.3.0
bitsandbytes: current HEAD c3b8de2

Reproduction

When building Bitsandbytes on Ubuntu 24.04 with CUDA 13.0.88 using CMake version 3.23.0 or higher, CMake attempts to compile for Maxwell, Pascal, and Volta architectures, which are no longer supported in CUDA 13.
This leads to the following compilation error:

$ git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git
$ cd bitsandbytes
$ cmake -DCOMPUTE_BACKEND=cuda -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc  -S .
-- The CXX compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring bitsandbytes (Backend: cuda)
-- The CUDA compiler identification is NVIDIA 13.0.88
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/x86_64-linux/include (found version "13.0.88")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CUDA Version: 130 (13.0.88)
-- CUDA Compiler: /usr/local/cuda-13.0/bin/nvcc
-- CMAKE_CUDA_COMPILER_VERSION: 13.0.88
-- CMAKE_VERSION: 3.28.3
-- CUDA Capabilities Available: 50;52;53;60;61;62;70;72;75;80;86;87;89;90
-- CUDA Capabilities  Selected: 50;52;53;60;61;62;70;72;75;80;86;87;89;90
-- CUDA Targets: 50-real;52-real;53-real;60-real;61-real;62-real;70-real;72-real;75-real;80-real;86-real;87-real;89-real;90
-- CUDA NVCC Flags:  --use_fast_math
-- Configuring done (5.4s)
-- Generating done (0.0s)
-- Build files have been written to: /home/dhsung/temp/bitsandbytes
$ make
[ 14%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/common.cpp.o
[ 28%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/cpu_ops.cpp.o
[ 42%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/pythonInterface.cpp.o
[ 57%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o
nvcc fatal   : Unsupported gpu architecture 'compute_50'
make[2]: *** [CMakeFiles/bitsandbytes.dir/build.make:119: CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/bitsandbytes.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

Expected behavior

I checked the current CMakeList.txt file
Below the CMakeList.txt https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt#L123-L124
For CMake 3.23 or later, CUDA architectures should be automatically determined based on the detected CUDA version.

In this case(CUDA 13), Maxwell (5.x), Pascal (6.x), and Volta (7.0/7.2) should be excluded as these are no longer supported by CUDA 13.

However, CMake still selected these architectures, leading to the unsupported
CUDA Capabilities Selected: 50;52;53;60;61;62;70;72;
50-real;52-real;53-real;60-real;61-real;62-real;70-real;72-real; targets.

Expected build

$ git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git
$ cd bitsandbytes
$ cmake -DCOMPUTE_BACKEND=cuda -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc   -S .
-- The CXX compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring bitsandbytes (Backend: cuda)
-- The CUDA compiler identification is NVIDIA 13.0.88
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/x86_64-linux/include (found version "13.0.88")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CUDA Version: 130 (13.0.88)
-- CUDA Compiler: /usr/local/cuda-13.0/bin/nvcc
-- CMAKE_CUDA_COMPILER_VERSION: 13.0.88
-- CMAKE_VERSION: 3.28.3
-- CUDA Capabilities Available: 50;52;53;60;61;62;70;72;75;80;86;87;89;90
-- CUDA Capabilities  Selected: 75;80;86;87;89;90;100;103;110;120;121
-- CUDA Targets: 75-real;80-real;86-real;87-real;89-real;90-real;100-real;103-real;110-real;120-real;121
-- CUDA NVCC Flags:  --use_fast_math
-- Configuring done (5.4s)
-- Generating done (0.0s)
-- Build files have been written to:  /home/dhsung/temp/bitsandbytes
$ make
[ 14%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/common.cpp.o
[ 28%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/cpu_ops.cpp.o
[ 42%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/pythonInterface.cpp.o
[ 57%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o
[ 71%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/kernels.cu.o
[ 85%] Linking CUDA device code CMakeFiles/bitsandbytes.dir/cmake_device_link.o
[100%] Linking CXX shared library bitsandbytes/libbitsandbytes_cuda130.so
[100%] Built target bitsandbytes

I checked CMakeList.txt's build option -DCOMPUTE_CAPABILITY=, so I can build bitsandbytes on CUDA 13.
The command below is my current bitsandbytes build command.

$ git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git
$ cd bitsandbytes
$ cmake -DCOMPUTE_BACKEND=cuda -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc  -DCOMPUTE_CAPABILITY="75;80;86;87;89;90;100;103;110;120;121" -S .
-- The CXX compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring bitsandbytes (Backend: cuda)
-- The CUDA compiler identification is NVIDIA 13.0.88
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/x86_64-linux/include (found version "13.0.88")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CUDA Version: 130 (13.0.88)
-- CUDA Compiler: /usr/local/cuda-13.0/bin/nvcc
-- CMAKE_CUDA_COMPILER_VERSION: 13.0.88
-- CMAKE_VERSION: 3.28.3
-- CUDA Capabilities Available: 50;52;53;60;61;62;70;72;75;80;86;87;89;90
-- CUDA Capabilities  Selected: 75;80;86;87;89;90;100;103;110;120;121
-- CUDA Targets: 75-real;80-real;86-real;87-real;89-real;90-real;100-real;103-real;110-real;120-real;121
-- CUDA NVCC Flags:  --use_fast_math
-- Configuring done (5.4s)
-- Generating done (0.0s)
-- Build files have been written to: /home/dhsung/temp/bitsandbytes
$ make
[ 14%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/common.cpp.o
[ 28%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/cpu_ops.cpp.o
[ 42%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/pythonInterface.cpp.o
[ 57%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o
[ 71%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/kernels.cu.o
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi1EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi0EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi1EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi0EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi1EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi0EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi1EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi0EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi1EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi0EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi1EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi0EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
[ 85%] Linking CUDA device code CMakeFiles/bitsandbytes.dir/cmake_device_link.o
[100%] Linking CXX shared library bitsandbytes/libbitsandbytes_cuda130.so
[100%] Built target bitsandbytes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions