Replies: 1 comment 1 reply
-
This can happen when using mmap. The CPU buffer size in this case represents the size of the memory mapped file, it is not really a separately allocated buffer. Under Linux the offloaded portions of the model will be unmapped after loading, but that cannot be done on Windows. Using |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have 3 GPUs, two 24gb and 1 12gb for a total of 60gb of VRAM. I'm trying goliath with a size of 66gb, so I figure, say 58gb for VRAM, 10-12 at most for CPU. Running with a context size of 512. How come CPU buffer size is still 67gb after almost filling up the VRAMs?
66G Feb 9 07:58 goliath-120b.Q4_K_M.gguf
llm_load_tensors: offloading 110 repeating layers to GPU
llm_load_tensors: offloaded 110/138 layers to GPU
llm_load_tensors: CPU buffer size = 67364.36 MiB
llm_load_tensors: CUDA0 buffer size = 10584.06 MiB
llm_load_tensors: CUDA1 buffer size = 20978.38 MiB
llm_load_tensors: CUDA2 buffer size = 21809.56 MiB
Beta Was this translation helpful? Give feedback.
All reactions