How is the GGUF file used by the llama-server to produce the ouput? #12589
Unanswered
yogheswaran-A
asked this question in
Q&A
Replies: 1 comment 1 reply
-
You can refer to the following https://github.com/ggml-org/llama.cpp/tree/master/examples/simple https://github.com/ggml-org/llama.cpp/tree/master/examples/simple-chat |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am trying to learn how the llama.cpp repo works. I have quantized llama 3.1 8B and stored the data in GGUF format as described in the llama.cpp repo. I want to know which script(s) is(are) responsible for using the GGUF file to produce the outputs for the input tokens when llama-server is called.
Beta Was this translation helpful? Give feedback.
All reactions