Latest release builds not using AMD GPU on windows #9256

The-Lord-of-Owls · 2024-08-30T21:20:49Z

The-Lord-of-Owls
Aug 30, 2024

For my setup I'm using the RX 7600xt, and a uncensored Llama 3.1 model. I'm trying to use the llama-server.exe to load the model and run it on the GPU.

System specs:
CPU: 6 core Ryzen 5 with max 12 threads
GPU RX 7600xt w/ 16gb vram
System RAM: 64gb

Model: https://huggingface.co/LWDCLS/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF-IQ-Imatrix-Request
cli being used: llama-server -m DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-Q8_0-imat.gguf --port 8080

Note: The model file is located next to the llama-server.exe file

Current problem: I am able to start up llama-server with the model loading and the server allows me to connect in my browser just fine. When typing something in I get a response as intended. The actual PROBLEM is that its only running it on the CPU and completely ignoring the GPU and I don't see any setting in the browser page to specify to use the GPU explicitly, Nor did I notice a cli option to give it either for this.

I know that it isn't using the GPU as the utilization stays at 1% in task manager and the memory usage is no different than it was before starting llama-server.

The builds I've tried from the latest release
llama-b3647-bin-win-openblas-x64.zip - it works but only runs on cpu( current build being attempted before writing this post )
llama-b3647-bin-win-avx2-x64.zip - it works but only runs on cpu
llama-b3647-bin-win-cuda-cu12.2.0-x64.zip - Does not load at all and gives a window error for not having nvidia stuff for cuda installed
llama-b3647-bin-win-vulkan-x64.zip - Starts up llama-server but eventually fails to load the model( error provided below )

Error from vulkan build:

ggml_vulkan: Failed to allocate pinned memory.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
llama_kv_cache_init:        CPU KV buffer size = 16384.00 MiB
llama_new_context_with_model: KV self size  = 16384.00 MiB, K (f16): 8192.00 MiB, V (f16): 8192.00 MiB
llama_new_context_with_model: Vulkan_Host  output buffer size =     0.98 MiB
ggml_vulkan: Device memory allocation of size 9420409088 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate AMD Radeon RX 7600 XT buffer of size 9420409088
llama_new_context_with_model: failed to allocate compute buffers
llama_init_from_gpt_params: error: failed to create context with model 'DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-Q8_0-imat.gguf'
 ERR [              load_model] unable to load model | tid="10696" timestamp=1725052391 model="DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-Q8_0-imat.gguf"
 ERR [                    main] exiting due to model loading error | tid="10696" timestamp=1725052391

Allan-Luu · 2024-08-31T01:22:50Z

Allan-Luu
Aug 31, 2024

Did you compile your project with the correct flags? By compiling with just make the GPU functions won't be incorporated into the cli or server

If you did properly compile the project properly, try adding the -ngl x flag to your input, where x is the number of layers you want to offload to GPU. Here is some information on -ngl by using ./llama-server --help

Unfortunately I'm running linux, but hope this helps

5 replies

The-Lord-of-Owls Aug 31, 2024
Author

Just tried specifying -ngl 33 and --gpu-layers 33 and still seems to only be using the CPU and system ram. On the open blas prebuild. Going to be installing CMake and see if I can get it to build for me next and will be back if it works or doesn't in the morning

Allan-Luu Aug 31, 2024

You can also try adding export HIP_VISIBLE_DEVICES=x where x is the node number of your GPU, this will choose which GPU you target for the offload.

The-Lord-of-Owls Aug 31, 2024
Author

So I had to do some hunting for the right gpu gfx value to set, found out I needed to specify 1102 for my card.

This is the exact command line I'm running

set PATH=%HIP_PATH%\bin;%PATH%
cmake -S . -B build -G Ninja -DAMDGPU_TARGETS=gfx1102 -DGGML_HIPBLAS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release
cmake --build build

and when ran I'm getting this error CMake Error: CMake was unable to find a build program corresponding to "Ninja". CMAKE_MAKE_PROGRAM is not set. You probably need to select a different build tool.
-- Configuring incomplete, errors occurred!

I've tried removing the -G Ninja from the cli but for some reason gives the exact same error. I have no idea what ninja is but my guess is it's something specific to linux? cause I'm not seeing much explicitly saying it's for windows

The-Lord-of-Owls Sep 3, 2024
Author

So bit of an update, I had to actually look up how cmake works just to know what to do with the command line and debug cmake's cache file causing problems after.

Managed to get it to make an SLN for VS2022 successfully but it did have some issues, namely it couldn't find ggml.h which required having to manually include the "ggml/include/" folder

But the only thing I'm getting from running build on release for the solution is a lib file here "src\build\Release\llama.lib"

Not a single .exe or .dll is ever produced when compiled on release.

I'm somewhat thinking the problem here is cmake not properly building the sln file but I don't know what to really do to fix the problem

EDIT: digging in the repo I've managed to find out why and I absolutely hate how this project is organized. the exe for llama-server.exe isn't produced from the src directory, its from the examples. I'm going to see if I can get cmake to work right on the directory in there and if I run into any other issue I'm probably just going to make a premake script and build with that instead

Allan-Luu Sep 3, 2024

The -G Ninja might be defining the cmake to use the Ninja build system for c++, which would just make build time faster. If CMake wasn't able to find Ninja you might need to install it.

When I run CMake it builds the executables in the ./build/bin, were you looking there? Using make to build, it'll build the exe's in the ./llama.cpp directory.

The-Lord-of-Owls · 2024-08-31T01:30:21Z

The-Lord-of-Owls
Aug 31, 2024
Author

I didn't compile, I used the prebuilt stuff in the releases page. I'll try doing a compile in a couple hours or in the morning and see if that works

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Latest release builds not using AMD GPU on windows #9256

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Latest release builds not using AMD GPU on windows #9256

Uh oh!

The-Lord-of-Owls Aug 30, 2024

Replies: 2 comments · 5 replies

Uh oh!

Allan-Luu Aug 31, 2024

Uh oh!

The-Lord-of-Owls Aug 31, 2024 Author

Uh oh!

Allan-Luu Aug 31, 2024

Uh oh!

The-Lord-of-Owls Aug 31, 2024 Author

Uh oh!

Uh oh!

The-Lord-of-Owls Sep 3, 2024 Author

Uh oh!

Allan-Luu Sep 3, 2024

Uh oh!

The-Lord-of-Owls Aug 31, 2024 Author

The-Lord-of-Owls
Aug 30, 2024

Replies: 2 comments 5 replies

Allan-Luu
Aug 31, 2024

The-Lord-of-Owls Aug 31, 2024
Author

The-Lord-of-Owls Aug 31, 2024
Author

The-Lord-of-Owls Sep 3, 2024
Author

The-Lord-of-Owls
Aug 31, 2024
Author