Replies: 2 comments 1 reply
-
But with ggml, you should always offload all layers. There's 4 model sizes. It might also be difficult for developers to optimize if the nvidia drivers automatically offload to ram nowadays. :( |
Beta Was this translation helpful? Give feedback.
1 reply
-
No. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I don't want to have to experiment to find the optimal number for
-ngl
and rather have llama.cpp choose for me. So is there anything built in or anyone has a recommendation of how I could do this outside of llama.cpp?Beta Was this translation helpful? Give feedback.
All reactions