Replies: 1 comment
-
This looks like this is because it is trying to allocate a kv-cache using a context size of 30016. This will cause the kv-cache to have a size of: kv-cache size = 2 * // keys and values
ctx.cparams.n_ctx *
ctx.model.hparams.n_layer *
ctx.model.hparams.n_head_kv(0) *
ctx.model.hparams.n_embd_head_k *
ctx.kv_self.type_k
kv-cache size = 2 * 30016 * 32 * 8 * 128 * 2 bytes
= 2 * 30016 * 32 * 8 * 128 * 2
= 3934257152
= 3934257152 / (1024*1024)
= 3,750 MB You might be able to specify a smaller context size using |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm attempting to run llama.cpp through the python wrapper and I am getting an OOM error even though the model size is 6B which is less than the 8GB VRAM on my 3070 RTX ubuntu system.
The crash happens on this line attempting to create the llama context
the errors given are as follows
Beta Was this translation helpful? Give feedback.
All reactions