Skip to content

Commit cfaa525

Browse files
committed
server: allow to specify tokens as strings in logit_bias
1 parent 6e99f2a commit cfaa525

File tree

2 files changed

+26
-8
lines changed

2 files changed

+26
-8
lines changed

examples/server/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ node index.js
185185

186186
`ignore_eos`: Ignore end of stream token and continue generating (default: false).
187187

188-
`logit_bias`: Modify the likelihood of a token appearing in the generated text completion. For example, use `"logit_bias": [[15043,1.0]]` to increase the likelihood of the token 'Hello', or `"logit_bias": [[15043,-1.0]]` to decrease its likelihood. Setting the value to false, `"logit_bias": [[15043,false]]` ensures that the token `Hello` is never produced (default: []).
188+
`logit_bias`: Modify the likelihood of a token appearing in the generated text completion. For example, use `"logit_bias": [[15043,1.0]]` to increase the likelihood of the token 'Hello', or `"logit_bias": [[15043,-1.0]]` to decrease its likelihood. Setting the value to false, `"logit_bias": [[15043,false]]` ensures that the token `Hello` is never produced. The tokens can also be represented as strings, e.g. `[["Hello, World!",-0.5]]` will reduce the likelihood of all the individual tokens that represent the string `Hello, World!`, just like the `presence_penalty` does. (default: []).
189189

190190
`n_probs`: If greater than 0, the response also contains the probabilities of top N tokens for each generated token (default: 0)
191191

examples/server/server.cpp

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -625,18 +625,36 @@ struct llama_server_context
625625
const int n_vocab = llama_n_vocab(model);
626626
for (const auto &el : *logit_bias)
627627
{
628-
if (el.is_array() && el.size() == 2 && el[0].is_number_integer())
628+
if (el.is_array() && el.size() == 2)
629629
{
630-
llama_token tok = el[0].get<llama_token>();
631-
if (tok >= 0 && tok < n_vocab)
630+
float bias;
631+
if (el[1].is_number())
632632
{
633-
if (el[1].is_number())
633+
bias = el[1].get<float>();
634+
}
635+
else if (el[1].is_boolean() && !el[1].get<bool>())
636+
{
637+
bias = -INFINITY;
638+
}
639+
else
640+
{
641+
continue;
642+
}
643+
644+
if(el[0].is_number_integer())
645+
{
646+
llama_token tok = el[0].get<llama_token>();
647+
if (tok >= 0 && tok < n_vocab)
634648
{
635-
slot->sparams.logit_bias[tok] = el[1].get<float>();
649+
slot->sparams.logit_bias[tok] = bias;
636650
}
637-
else if (el[1].is_boolean() && !el[1].get<bool>())
651+
}
652+
else if (el[0].is_string())
653+
{
654+
auto toks = llama_tokenize(model, el[0].get<std::string>(), false);
655+
for(auto tok : toks)
638656
{
639-
slot->sparams.logit_bias[tok] = -INFINITY;
657+
slot->sparams.logit_bias[tok] = bias;
640658
}
641659
}
642660
}

0 commit comments

Comments
 (0)