hot reload llama.cpp #3784

spirobel · 2023-10-25T20:46:30Z

spirobel
Oct 25, 2023

I made a thing to hot reload llama.cpp code.
https://github.com/spirobel/bunny-llama
So you can keep the model loaded, change and recompile the prompting function and rerun it instantly without having to reload the model.

the bun clone command will fork the llama.cpp repo specified in the env file. You can replace it with your own if you want to experiment on your own with llama.cpp.

I added an api-llama example in mine: https://github.com/spirobel/llama.cpp/tree/api-llama-example/examples/api-llama it just has a very straight forward load_model() and prompt() function.
libllama.so has already so much stuff :-) i wanted to have a straight forward api to play with.

This example is also helpful for people that want to integrate and ship llama.cpp as part of a bigger app.
it will also build the cuda version statically so your users wont have to have the cuda toolkit around!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hot reload llama.cpp #3784

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

hot reload llama.cpp #3784

Uh oh!

spirobel Oct 25, 2023

Replies: 0 comments

spirobel
Oct 25, 2023