Replies: 1 comment 1 reply
-
If a layer is not loaded to the GPU, it will still use cuBLAS, only that it needs to copy the data to the device before calculation. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have just have 6GB NVIDIA GPU. So most of the time I will be offloading some of the model layers to GPU.
Does it make sense to compile with both LLAMA_OPENBLAS=1 and LLAMA_CUBLAS=1 enabled?
Will that give any overall performance improvement?
Beta Was this translation helpful? Give feedback.
All reactions