Replies: 1 comment 2 replies
-
Exact same crash with
I read https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
My case - I use 100% working model ~8GB size but for test - I keep ~7Gb free in vram - so model wont load without limiting and it does load - vram goes full, nothing freeze and ~1GB ram being used - but crash in 1 sec not sure what going there |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I wanted to test CUDA UVM support (#8035), mostly to see if it's viable on my 8GB VRAM setup (offloading only most layers on GPU + flash attention is already good enough in practice for me). However, llama.cpp scores an unexpected out-of-memory error and crashes when it starts to warm the model up:
Looking at code in question, this seems especially odd to me, as there's no obvious dynamic allocation taking place there. I also do have
nvidia-uvm
kernel module loaded.Beta Was this translation helpful? Give feedback.
All reactions