Skip to content

Commit 3015851

Browse files
authored
llama : add getters for n_threads/n_threads_batch (ggml-org#7464)
* llama : add getters for n_threads/n_threads_batch This commit adds two new functions to the llama API. The functions can be used to get the number of threads used for generating a single token and the number of threads used for prompt and batch processing (multiple tokens). The motivation for this is that we want to be able to get the number of threads that the a context is using. The main use case is for a testing/verification that the number of threads is set correctly. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! llama : add getters for n_threads/n_threads_batch Rename the getters to llama_n_threads and llama_n_threads_batch. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
1 parent 55ac3b7 commit 3015851

File tree

2 files changed

+14
-0
lines changed

2 files changed

+14
-0
lines changed

llama.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17410,6 +17410,14 @@ void llama_set_n_threads(struct llama_context * ctx, uint32_t n_threads, uint32_
1741017410
ctx->cparams.n_threads_batch = n_threads_batch;
1741117411
}
1741217412

17413+
uint32_t llama_n_threads(struct llama_context * ctx) {
17414+
return ctx->cparams.n_threads;
17415+
}
17416+
17417+
uint32_t llama_n_threads_batch(struct llama_context * ctx) {
17418+
return ctx->cparams.n_threads_batch;
17419+
}
17420+
1741317421
void llama_set_abort_callback(struct llama_context * ctx, bool (*abort_callback)(void * data), void * abort_callback_data) {
1741417422
ctx->abort_callback = abort_callback;
1741517423
ctx->abort_callback_data = abort_callback_data;

llama.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -759,6 +759,12 @@ extern "C" {
759759
// n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
760760
LLAMA_API void llama_set_n_threads(struct llama_context * ctx, uint32_t n_threads, uint32_t n_threads_batch);
761761

762+
// Get the number of threads used for generation of a single token.
763+
LLAMA_API uint32_t llama_n_threads(struct llama_context * ctx);
764+
765+
// Get the number of threads used for prompt and batch processing (multiple token).
766+
LLAMA_API uint32_t llama_n_threads_batch(struct llama_context * ctx);
767+
762768
// Set whether to use causal attention or not
763769
// If set to true, the model will only attend to the past tokens
764770
LLAMA_API void llama_set_causal_attn(struct llama_context * ctx, bool causal_attn);

0 commit comments

Comments
 (0)