Skip to content

Looking for help understanding llama-server /metrics #10325

Answered by Allan-Luu
Allan-Luu asked this question in Q&A
Discussion options

You must be logged in to vote

Thanks for the answer @dspasyuk . It pointed me in the right direction!

I found that the function update_slots

These lines will start processing the prompts from the slots within the server, which are considered initial prompt tokens:

https://github.com/ggerganov/llama.cpp/blob/0fff7fd79818980763a601660f25b01a0cf4b87a/examples/server/server.cpp#L1874-L1880

Then these lines will checks whether the prompt exceeds the context size (slot.n_ctx). If so, it truncates the input to fit within n_ctx

https://github.com/ggerganov/llama.cpp/blob/0fff7fd79818980763a601660f25b01a0cf4b87a/examples/server/server.cpp#L1936-L1958

Here, the function is managing the cache where it reuses for efficiency, like…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@Allan-Luu
Comment options

Answer selected by Allan-Luu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants