Issue with Installing llama-cpp-python with GPU Support #25342

rmansoul · 2024-08-13T12:13:37Z

rmansoul
Aug 13, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

**Commands Attempted:**
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python==0.2.77 --no-cache-dir
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python==0.2.77 --no-cache-dir

Description

Description of the Issue:
I'm encountering difficulties installing llama-cpp-python with GPU support on an AWS g5.4xLarge virtual machine.
Here are the details of the issue:
Commands Attempted:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python==0.2.77 --no-cache-dir
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python==0.2.77 --no-cache-dir
Error Encountered:
**When attempting to install llama-cpp-python with the CMAKE_ARGS options, I receive the following error:

ubuntu@ip-172-31-25-122:~/datafari-rag$ CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
Collecting llama-cpp-python
Downloading llama_cpp_python-0.2.88.tar.gz (63.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.7/63.7 MB 171.6 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Collecting numpy>=1.20.0
Downloading numpy-2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.5/19.5 MB 347.6 MB/s eta 0:00:00
Collecting jinja2>=2.11.3
Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 362.0 MB/s eta 0:00:00
Collecting diskcache>=5.6.1
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 281.5 MB/s eta 0:00:00
Collecting MarkupSafe>=2.0
Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [28 lines of output]
*** scikit-build-core 0.10.2 using CMake 3.30.2 (wheel)
*** Configuring CMake...
loading initial cache file /tmp/tmp7laa280n/build/CMakeInit.txt
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.25.1")
CMake Error at vendor/llama.cpp/CMakeLists.txt:95 (message):
LLAMA_CUBLAS is deprecated and will be removed in the future.

    Use GGML_CUDA instead

  Call Stack (most recent call first):
    vendor/llama.cpp/CMakeLists.txt:100 (llama_option_depr)


  -- Configuring incomplete, errors occurred!

  *** CMake configuration failed
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects**

and when i change to this command :
CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
The installation is normal but the GPU is not supported

Questions:
Is there a specific update or version of llama-cpp-python that supports GPU correctly?
Thank you in advance for your help!

System Info

Operating System: Ubuntu
Virtual Machine: AWS g5.4xLarge
Python: 3.10.9
Installed Libraries:
fastapi==0.110.0
uvicorn==0.23.2
langchain==0.1.12
langchain-cli==0.0.21
langchain-community==0.0.28
langchain-core==0.1.31
langchain-text-splitters==0.0.1
python-dotenv==1.0.1
pydantic==1.10.13
fastapi-limiter==0.1.6
llama-cpp-python==0.2.77

@dosu · 2024-08-13T12:15:31Z

dosubot[bot]
bot Aug 13, 2024

The error you're encountering indicates that the LLAMA_CUBLAS option is deprecated and should be replaced with GGML_CUDA. Here are the steps to correctly install llama-cpp-python with GPU support on your AWS g5.4xLarge instance:

Use the Correct CMake Argument:
Replace LLAMA_CUBLAS with GGML_CUDA in your installation command.

Reinstall with the Correct Backend:
Use the following command to reinstall llama-cpp-python with GPU support:

CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

Verify GPU Parameters in Your Code:
Ensure that you are setting the GPU parameters correctly in your code. Here is an example:

n_gpu_layers = -1  # The number of layers to put on the GPU. Use -1 to move all to GPU.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="/path/to/your/model.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

Check for BLAS Indicator:
After installation, check if the BLAS = 1 indicator is present in the model properties to confirm that the BLAS backend is being used.

By following these steps, you should be able to resolve the issue and enable GPU support for llama-cpp-python on your AWS g5.4xLarge instance ^[1].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with Installing llama-cpp-python with GPU Support #25342

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Issue with Installing llama-cpp-python with GPU Support #25342

Uh oh!

rmansoul Aug 13, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment

Uh oh!

dosubot[bot] bot Aug 13, 2024

rmansoul
Aug 13, 2024

dosubot[bot]
bot Aug 13, 2024