-
Hi, i am still new to llama.cpp. I tried to load a large model (deepseekv2) on a large computer with 512GB ddr5 memory. Is this an issue with the llamacpp? or with the model? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
It tried to allocate 805306368032 bytes (750 GiB).
Yes, this likely what happened.
Try using a smaller context size, with the For example (extending the command you've used): $ ./llama-server -m ./models/deepseek.gguf --port 8080 -c 32768 This should make the huge buffer from before smaller at around 150 GiB (since 32768 is 5 times smaller than 163840, and 150 GiB is 5 times smaller than 750 GiB). If that's still too big, try smaller values after |
Beta Was this translation helpful? Give feedback.
It tried to allocate 805306368032 bytes (750 GiB).
Yes, this likely what happened.
Try using a smaller context size, with the
-c
flag.For example (extending the command you've used):
$ ./llama-server -m ./models/deepseek.gguf --port 8080 -c 32768
This should make the huge buffer from before smaller at around 150 GiB (since 32768 is 5 times smaller than 163840, and 150 …