llama-cpp-python 0.3.8 with CUDA #2010

SeBL4RD · 2025-05-01T23:59:33Z

When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?
It's a real nightmare to make it work any other way.
0.3.4 works with CUDA, but it doesn't take into account models like Qwen 3 quantified.

with kind regards

JamePeng · 2025-05-02T08:43:18Z

You can try to compile the new code I maintain here: https://github.com/JamePeng/llama-cpp-python, but I only pre-compiled the Windows and linux version based on the recent code

m-from-space · 2025-05-12T16:21:31Z

When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?

I probably misunderstand your question, but I am using 0.3.9 with CUDA via pip. I can use Qwen3 related models as well (using Iquants for example or flash attention, is that what you are referring to?). Sorry if my comment is useless.

I built using...
CMAKE_ARGS="-DGGML_CUDA=on -DLLAVA_BUILD=off -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

michelonfrancoisSUMMIT · 2025-05-16T08:08:51Z

Hi, I think it might come from the fact that wheels have not been updated to recent version of llama-cpp-python. You can see that of this link https://abetlen.github.io/llama-cpp-python/whl/cu122/llama-cpp-python/. Last version available is 3.4. Could you add newer wheels @abetlen ?

michelonfrancoisSUMMIT · 2025-05-16T08:12:10Z

@m-from-space i have tried your solution and I got this error:

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [47 lines of output]
      *** scikit-build-core 0.11.3 using CMake 3.22.1 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpovycplbn/build/CMakeInit.txt
      -- The C compiler identification is Clang 14.0.0
      -- The CXX compiler identification is Clang 14.0.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/clang - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - failed
      -- Check for working CXX compiler: /usr/bin/clang++
      -- Check for working CXX compiler: /usr/bin/clang++ - broken
      CMake Error at /usr/share/cmake-3.22/Modules/CMakeTestCXXCompiler.cmake:62 (message):
        The C++ compiler
      
          "/usr/bin/clang++"
      
        is not able to compile a simple test program.
      
        It fails with the following output:
      
          Change Dir: /tmp/tmpovycplbn/build/CMakeFiles/CMakeTmp
      
          Run Build Command(s):ninja cmTC_9090e && [1/2] Building CXX object CMakeFiles/cmTC_9090e.dir/testCXXCompiler.cxx.o
          [2/2] Linking CXX executable cmTC_9090e
          FAILED: cmTC_9090e
          : && /usr/bin/clang++  -pthread   CMakeFiles/cmTC_9090e.dir/testCXXCompiler.cxx.o -o cmTC_9090e   && :
          /usr/bin/ld : ne peut pas trouver -lstdc++ : Aucun fichier ou dossier de ce nom
          clang: error: linker command failed with exit code 1 (use -v to see invocation)
          ninja: build stopped: subcommand failed.
      
      
      
      
      
        CMake will not be able to correctly generate this project.
      Call Stack (most recent call first):
        CMakeLists.txt:3 (project)
      
      
      -- Configuring incomplete, errors occurred!
      See also "/tmp/tmpovycplbn/build/CMakeFiles/CMakeOutput.log".
      See also "/tmp/tmpovycplbn/build/CMakeFiles/CMakeError.log".
      
      *** CMake configuration failed
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

SeBL4RD · 2025-05-18T23:28:45Z

When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?

I probably misunderstand your question, but I am using 0.3.9 with CUDA via pip. I can use Qwen3 related models as well (using Iquants for example or flash attention, is that what you are referring to?). Sorry if my comment is useless.

I built using... CMAKE_ARGS="-DGGML_CUDA=on -DLLAVA_BUILD=off -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

Man... I've tried for hours to get similar commands to work, never succeeded, I used JamePeng's wheel to make up for it.
I just tried your command and it works fine. I just don't get it.
I don't understand these installations with CMAKE, and how it can work when Ableten hasn't done anything about it.

kyteinsky mentioned this issue May 15, 2025

[feat]: build llama-cpp-python instead of using the whl nextcloud/context_chat_backend#182

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-cpp-python 0.3.8 with CUDA #2010

llama-cpp-python 0.3.8 with CUDA #2010

SeBL4RD commented May 1, 2025

JamePeng commented May 2, 2025 •

edited

Loading

m-from-space commented May 12, 2025

michelonfrancoisSUMMIT commented May 16, 2025

michelonfrancoisSUMMIT commented May 16, 2025

SeBL4RD commented May 18, 2025

llama-cpp-python 0.3.8 with CUDA #2010

llama-cpp-python 0.3.8 with CUDA #2010

Comments

SeBL4RD commented May 1, 2025

JamePeng commented May 2, 2025 • edited Loading

m-from-space commented May 12, 2025

michelonfrancoisSUMMIT commented May 16, 2025

michelonfrancoisSUMMIT commented May 16, 2025

SeBL4RD commented May 18, 2025

JamePeng commented May 2, 2025 •

edited

Loading