13b model on M1 with 8gb RAM very slow #1500
Unanswered
joseph6377
asked this question in
Q&A
Replies: 2 comments 2 replies
-
You see, 13b size is larger than your physical ram, so it be cache in virtual ram, thus slower. You should use 7b for high speed. |
Beta Was this translation helpful? Give feedback.
2 replies
-
Two things; |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, I am bit of a noob here.
Running 4bit quantized models on M1 with 8gb RAM. When I run the 13B model it is very slow I have tried to set mlock as true as well. Any other parameters I need to tweak.
llama.cpp: loading model from /Users/jo/Documents/llama.cpp/models/wizard-mega-13B.ggml.q4_0.bin
llama_model_load_internal: format = ggjt v2 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 90.75 KB
llama_model_load_internal: mem required = 9807.48 MB (+ 1608.00 MB per state)
..............................................................warning: failed to mlock 44236800-byte buffer (after previously locking 5073518592 bytes): Resource temporarily unavailable
......................................
llama_init_from_file: kv self size = 400.00 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
Beta Was this translation helpful? Give feedback.
All reactions