[Performance]: logit bias implementation uses a slow for loop

### Proposal to improve performance

We were running logit bias with a big dictionary and noticed a significant slowdown in generation. We looked into the implementation and saw that it uses a for loop

https://github.com/vllm-project/vllm/blob/a79b1224005836bdf0ab6d3bab807d2f5d8a5ef1/vllm/entrypoints/openai/logits_processors.py#L48

Any specific reason this is done this way? From quick tests it seems that a `scatter_add` would be significantly faster. If this has not been considered before, I'll spend some time to make a proper benchmark and a PR.

On my Mac, with a -100 bias on 40k tokens out of 150k
```
In [48]: %timeit f_for(x, logit_bias)
106 ms ± 2.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [49]: %timeit f_scatter(x, logit_bias)
3.74 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: logit bias implementation uses a slow for loop #10741

Proposal to improve performance

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: logit bias implementation uses a slow for loop #10741

Description

Proposal to improve performance

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions