Skip to content

llama-cpp-python 0.3.8 with CUDA #2010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SeBL4RD opened this issue May 1, 2025 · 5 comments
Open

llama-cpp-python 0.3.8 with CUDA #2010

SeBL4RD opened this issue May 1, 2025 · 5 comments

Comments

@SeBL4RD
Copy link

SeBL4RD commented May 1, 2025

When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?
It's a real nightmare to make it work any other way.
0.3.4 works with CUDA, but it doesn't take into account models like Qwen 3 quantified.

with kind regards

@JamePeng
Copy link

JamePeng commented May 2, 2025

You can try to compile the new code I maintain here: https://github.com/JamePeng/llama-cpp-python, but I only pre-compiled the Windows and linux version based on the recent code

@m-from-space
Copy link

When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?

I probably misunderstand your question, but I am using 0.3.9 with CUDA via pip. I can use Qwen3 related models as well (using Iquants for example or flash attention, is that what you are referring to?). Sorry if my comment is useless.

I built using...
CMAKE_ARGS="-DGGML_CUDA=on -DLLAVA_BUILD=off -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

@michelonfrancoisSUMMIT
Copy link

Hi, I think it might come from the fact that wheels have not been updated to recent version of llama-cpp-python. You can see that of this link https://abetlen.github.io/llama-cpp-python/whl/cu122/llama-cpp-python/. Last version available is 3.4. Could you add newer wheels @abetlen ?

@michelonfrancoisSUMMIT
Copy link

@m-from-space i have tried your solution and I got this error:

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [47 lines of output]
      *** scikit-build-core 0.11.3 using CMake 3.22.1 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpovycplbn/build/CMakeInit.txt
      -- The C compiler identification is Clang 14.0.0
      -- The CXX compiler identification is Clang 14.0.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/clang - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - failed
      -- Check for working CXX compiler: /usr/bin/clang++
      -- Check for working CXX compiler: /usr/bin/clang++ - broken
      CMake Error at /usr/share/cmake-3.22/Modules/CMakeTestCXXCompiler.cmake:62 (message):
        The C++ compiler
      
          "/usr/bin/clang++"
      
        is not able to compile a simple test program.
      
        It fails with the following output:
      
          Change Dir: /tmp/tmpovycplbn/build/CMakeFiles/CMakeTmp
      
          Run Build Command(s):ninja cmTC_9090e && [1/2] Building CXX object CMakeFiles/cmTC_9090e.dir/testCXXCompiler.cxx.o
          [2/2] Linking CXX executable cmTC_9090e
          FAILED: cmTC_9090e
          : && /usr/bin/clang++  -pthread   CMakeFiles/cmTC_9090e.dir/testCXXCompiler.cxx.o -o cmTC_9090e   && :
          /usr/bin/ld : ne peut pas trouver -lstdc++ : Aucun fichier ou dossier de ce nom
          clang: error: linker command failed with exit code 1 (use -v to see invocation)
          ninja: build stopped: subcommand failed.
      
      
      
      
      
        CMake will not be able to correctly generate this project.
      Call Stack (most recent call first):
        CMakeLists.txt:3 (project)
      
      
      -- Configuring incomplete, errors occurred!
      See also "/tmp/tmpovycplbn/build/CMakeFiles/CMakeOutput.log".
      See also "/tmp/tmpovycplbn/build/CMakeFiles/CMakeError.log".
      
      *** CMake configuration failed
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

@SeBL4RD
Copy link
Author

SeBL4RD commented May 18, 2025

When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?

I probably misunderstand your question, but I am using 0.3.9 with CUDA via pip. I can use Qwen3 related models as well (using Iquants for example or flash attention, is that what you are referring to?). Sorry if my comment is useless.

I built using... CMAKE_ARGS="-DGGML_CUDA=on -DLLAVA_BUILD=off -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

Man... I've tried for hours to get similar commands to work, never succeeded, I used JamePeng's wheel to make up for it.
I just tried your command and it works fine. I just don't get it.
I don't understand these installations with CMAKE, and how it can work when Ableten hasn't done anything about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants