hot reload llama.cpp #3784
spirobel
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I made a thing to hot reload llama.cpp code.
https://github.com/spirobel/bunny-llama
So you can keep the model loaded, change and recompile the prompting function and rerun it instantly without having to reload the model.
the
bun clone
command will fork the llama.cpp repo specified in theenv
file. You can replace it with your own if you want to experiment on your own with llama.cpp.I added an api-llama example in mine: https://github.com/spirobel/llama.cpp/tree/api-llama-example/examples/api-llama it just has a very straight forward load_model() and prompt() function.
libllama.so has already so much stuff :-) i wanted to have a straight forward api to play with.
This example is also helpful for people that want to integrate and ship llama.cpp as part of a bigger app.
it will also build the cuda version statically so your users wont have to have the cuda toolkit around!
Beta Was this translation helpful? Give feedback.
All reactions