Building up-to-date llama.cpp with Cosmopolitan #720

BradHutchings · 2025-03-17T02:09:12Z

BradHutchings
Mar 17, 2025

I love llamafile, but the lag for new model support behind llama.cpp is not ideal for my customers. So I decided to explore building the latest llama.cpp with cosmocc. It's challenging, and not completely working yet, but I do have a llama-server build that actually runs, loads a model, and does some chat on Ubuntu 24.04 for x86_64 and Windows 11 for x86_64. I will test the build on Raspberry Pi later tonight and hopefully on an M3 or M4 Mac tomorrow.

More information here:
ggml-org/llama.cpp#12375

Nobody in the llama.cpp crowd seems interested yet. Is having the latest models available when support is added in llama.cpp important to y'all's?

-Brad

cjpais · 2025-03-17T15:46:05Z

cjpais
Mar 17, 2025
Collaborator

cc: @stlhood

pretty interesting to me!

0 replies

BradHutchings · 2025-03-17T19:11:45Z

BradHutchings
Mar 17, 2025
Author

Running on a Raspberry Pi 5 16GB, Google Gemma 3 1B-Instruct, getting about 10 tokens/sec. That's really usable for a home network LLM server if the answers are generally good. This story looks as good as any 8B has ever made to me. I want to test with the 4B variant next. $150

To do list on this thing to get all the goodness I get from llamafile:

Does it build in support for GPUs? Can it?
Fix the missing std::fill in cosmocc libcxx (upstream is LLVM project).
Take advantage of /zip from cosmocc to package the .args and UI into single file.

Am I missing anything here? Is this an interesting path for the llamafile team or am I better served forking llama.cpp and building a separate project?

0 replies

cjpais · 2025-03-18T16:41:25Z

cjpais
Mar 18, 2025
Collaborator

I think if it's possible, it would be really great to have all of this within the existing llamafile repo, I certainly am happy to help with some of the development as well as testing and merging. Getting up to date with llama.cpp is certainly something we would like to do, as well as make it easier in the future to continue to stay up to date with upstream

I think what you're hinting at is effectively trying to replicate llamafile with the support for the newest features, which would be a great contribution

1 reply

BradHutchings Apr 21, 2025
Author

Might be a good time to take a look at what I've done. The big items on my to-do list are GPU support, which needs to be sorted out at runtime, and writing a zip-align tool that will work well with packing in a bunch of files, e.g. a whole website.

https://github.com/BradHutchings/llama-server-one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Building up-to-date llama.cpp with Cosmopolitan #720

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Building up-to-date llama.cpp with Cosmopolitan #720

Uh oh!

BradHutchings Mar 17, 2025

Replies: 3 comments · 1 reply

Uh oh!

cjpais Mar 17, 2025 Collaborator

Uh oh!

BradHutchings Mar 17, 2025 Author

Uh oh!

Uh oh!

cjpais Mar 18, 2025 Collaborator

Uh oh!

BradHutchings Apr 21, 2025 Author

BradHutchings
Mar 17, 2025

Replies: 3 comments 1 reply

cjpais
Mar 17, 2025
Collaborator

BradHutchings
Mar 17, 2025
Author

cjpais
Mar 18, 2025
Collaborator

BradHutchings Apr 21, 2025
Author