How is the GGUF file used by the llama-server to produce the ouput? #12589

yogheswaran-A · 2025-03-26T11:32:10Z

yogheswaran-A
Mar 26, 2025

Hi,
I am trying to learn how the llama.cpp repo works. I have quantized llama 3.1 8B and stored the data in GGUF format as described in the llama.cpp repo. I want to know which script(s) is(are) responsible for using the GGUF file to produce the outputs for the input tokens when llama-server is called.

ag2s20150909 · 2025-03-26T14:06:16Z

ag2s20150909
Mar 26, 2025

You can refer to the following

https://github.com/ggml-org/llama.cpp/tree/master/examples/simple

https://github.com/ggml-org/llama.cpp/tree/master/examples/simple-chat

1 reply

yogheswaran-A Mar 26, 2025
Author

@ag2s20150909 Thanks a ton!! It seems like this what I was looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How is the GGUF file used by the llama-server to produce the ouput? #12589

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How is the GGUF file used by the llama-server to produce the ouput? #12589

Uh oh!

yogheswaran-A Mar 26, 2025

Replies: 1 comment · 1 reply

Uh oh!

ag2s20150909 Mar 26, 2025

Uh oh!

yogheswaran-A Mar 26, 2025 Author

yogheswaran-A
Mar 26, 2025

Replies: 1 comment 1 reply

ag2s20150909
Mar 26, 2025

yogheswaran-A Mar 26, 2025
Author