Replies: 1 comment
-
Interesting, playing around with the other flags seemed to get the model
I think the thing that worked was setting Unfortunately it didn't help with the Further trail/error and guessing got this one to run:
How would I find out these parameters? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Sometimes I run into an error like the following when trying to load models, for example,
llama-3_3-nemotron-super-49b-v1-q6_k.gguf
which should/maybe (?) comfortably fit in VRAM.My environment:
If I increase
LLAMA_ARG_N_GPU_LAYERS
to 80 it complains about running out of memory:Even if the model is to large to fit in VRAM, I thought it was possible to also utilize the CPU and system RAM? Or am I mistaken?
Here's the output of
llama-cli --version
:Beta Was this translation helpful? Give feedback.
All reactions