Skip to content

Commit 44d3269

Browse files
committed
server: allow to specify tokens as strings in logit_bias
1 parent 1912211 commit 44d3269

File tree

2 files changed

+26
-8
lines changed

2 files changed

+26
-8
lines changed

examples/server/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ node index.js
181181

182182
`ignore_eos`: Ignore end of stream token and continue generating (default: false).
183183

184-
`logit_bias`: Modify the likelihood of a token appearing in the generated text completion. For example, use `"logit_bias": [[15043,1.0]]` to increase the likelihood of the token 'Hello', or `"logit_bias": [[15043,-1.0]]` to decrease its likelihood. Setting the value to false, `"logit_bias": [[15043,false]]` ensures that the token `Hello` is never produced (default: []).
184+
`logit_bias`: Modify the likelihood of a token appearing in the generated text completion. For example, use `"logit_bias": [[15043,1.0]]` to increase the likelihood of the token 'Hello', or `"logit_bias": [[15043,-1.0]]` to decrease its likelihood. Setting the value to false, `"logit_bias": [[15043,false]]` ensures that the token `Hello` is never produced. The tokens can also be represented as strings, e.g. `[["Hello, World!",-0.5]]` will reduce the likelihood of all the individual tokens that represent the string `Hello, World!`, just like the `presence_penalty` does. (default: []).
185185

186186
`n_probs`: If greater than 0, the response also contains the probabilities of top N tokens for each generated token (default: 0)
187187

examples/server/server.cpp

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -619,18 +619,36 @@ struct llama_server_context
619619
const int n_vocab = llama_n_vocab(model);
620620
for (const auto &el : *logit_bias)
621621
{
622-
if (el.is_array() && el.size() == 2 && el[0].is_number_integer())
622+
if (el.is_array() && el.size() == 2)
623623
{
624-
llama_token tok = el[0].get<llama_token>();
625-
if (tok >= 0 && tok < n_vocab)
624+
float bias;
625+
if (el[1].is_number())
626626
{
627-
if (el[1].is_number())
627+
bias = el[1].get<float>();
628+
}
629+
else if (el[1].is_boolean() && !el[1].get<bool>())
630+
{
631+
bias = -INFINITY;
632+
}
633+
else
634+
{
635+
continue;
636+
}
637+
638+
if(el[0].is_number_integer())
639+
{
640+
llama_token tok = el[0].get<llama_token>();
641+
if (tok >= 0 && tok < n_vocab)
628642
{
629-
slot->sparams.logit_bias[tok] = el[1].get<float>();
643+
slot->sparams.logit_bias[tok] = bias;
630644
}
631-
else if (el[1].is_boolean() && !el[1].get<bool>())
645+
}
646+
else if (el[0].is_string())
647+
{
648+
auto toks = llama_tokenize(model, el[0].get<std::string>(), false);
649+
for(auto tok : toks)
632650
{
633-
slot->sparams.logit_bias[tok] = -INFINITY;
651+
slot->sparams.logit_bias[tok] = bias;
634652
}
635653
}
636654
}

0 commit comments

Comments
 (0)