Very confused with the documentation and leveraging llama.cpp features #710

TFWol · 2025-03-12T09:59:44Z

TFWol
Mar 12, 2025

I want to use llamafile/llamafiler for FIM completion in vscode and it looks like using llama.cpp features is the best option.

The documentation points to a file with llama.cpp examples, but it's too abstract and doesn't really help with running via commandline.

The command I'm trying to translate from llama.cpp is:

llama-server -m qwen2.5-coder-3b-q8_0.gguf --port 8080 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256

The furthest I can get without "command not found" errors is:

./llamafile -m qwen2.5-coder-3b-q8_0.gguf --server -ngl 99 -fa -b 1024 --ctx-size 0

After hours of reading, I have no clue what the equivalents are for
-ub (--ubatch-size), -dt (--defrag-thold), and --cache-reuse 256

Could anyone offer some guidance or advice?

Answered by cjpais

Llamafile is quite behind in terms of the upstream features. They are likely not supported at this time.

cjpais · 2025-03-12T15:34:15Z

Llamafile is quite behind in terms of the upstream features. They are likely not supported at this time.

1 reply

Got it. Thanks for the clarification.