Questions related to llama.cpp options #3111
Answered
by
staviq
zastroyshchik
asked this question in
Q&A
Replies: 2 comments 9 replies
-
|
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
zastroyshchik
-
it was in fact not. llama 2 was pretrained on
no you do. it defaults to 512
can you elaborate on that? does the output seem cut-off? |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I use mainly this model, quantized at q4_0 and q5_1:
For now, I am testing it, on two 1070Ti, and I get ~4t/s with q4_0.
But now, I have some questions (for the sake of simplicity, we will only consider q4_0 quantized model):
--ctx-size
to, say, 7296? Should I get any improvement? Or it just wastes of memory? Or it does nothing?Indeed, if I set
--ctx-size
to 2048, I get this output:And if I set it to 7296, I get this:
But seems it does not impact the output length, nor the memory usage.
-ngl 38
, with--low-vram
, I am yet "surprised" to see that llama.cpp uses around 20GB of RAM, in addition to the ~15VRAM. But the q4_0 model is 17.7GB. So why so much RAM is used?Here what I get with these options
--ctx-size 2048 -ngl 38 -t 6 --keep -1 --main-gpu 1 --multiline-input --tensor-split 54,46 --low-vram --mlock
:Yet, llama.cpp use in addition 20GB of RAM. Does it mean that
--mlock
implies to load all the model into the RAM, in addition to the VRAM?Should I set
--batch-size
to some huge value, or should I keep it the same as--ctx-size
? Are these options related?The option
--tensor-split
seems rounding to one decimal place, right? Because, i tried value like--tensor-split 54.5,44.5
, but did not get the expected result (I get CUDA error 2 at ggml-cuda.cu:5031: out of memory).llama.cpp still produces log files (main..log and llama..log) despite the use of
--log-disable
. What am I missing?Beta Was this translation helpful? Give feedback.
All reactions