Building up-to-date llama.cpp with Cosmopolitan #720
Replies: 3 comments 1 reply
-
cc: @stlhood pretty interesting to me! |
Beta Was this translation helpful? Give feedback.
-
Running on a Raspberry Pi 5 16GB, Google Gemma 3 1B-Instruct, getting about 10 tokens/sec. That's really usable for a home network LLM server if the answers are generally good. This story looks as good as any 8B has ever made to me. I want to test with the 4B variant next. $150 To do list on this thing to get all the goodness I get from llamafile:
Am I missing anything here? Is this an interesting path for the llamafile team or am I better served forking llama.cpp and building a separate project? |
Beta Was this translation helpful? Give feedback.
-
I think if it's possible, it would be really great to have all of this within the existing llamafile repo, I certainly am happy to help with some of the development as well as testing and merging. Getting up to date with llama.cpp is certainly something we would like to do, as well as make it easier in the future to continue to stay up to date with upstream I think what you're hinting at is effectively trying to replicate llamafile with the support for the newest features, which would be a great contribution |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I love llamafile, but the lag for new model support behind llama.cpp is not ideal for my customers. So I decided to explore building the latest llama.cpp with cosmocc. It's challenging, and not completely working yet, but I do have a llama-server build that actually runs, loads a model, and does some chat on Ubuntu 24.04 for x86_64 and Windows 11 for x86_64. I will test the build on Raspberry Pi later tonight and hopefully on an M3 or M4 Mac tomorrow.
More information here:
ggml-org/llama.cpp#12375
Nobody in the llama.cpp crowd seems interested yet. Is having the latest models available when support is added in llama.cpp important to y'all's?
-Brad
Beta Was this translation helpful? Give feedback.
All reactions