Max prompt tokens/sequence length limit in vllm core scheduler #446
yuanheng-zhao
announced in
Q&A
Replies: 2 comments
-
Hi we just fixed this in the latest main. Please retry. |
Beta Was this translation helpful? Give feedback.
0 replies
-
WARNING 07-28 03:23:18 scheduler.py:196] Input prompt (2716 tokens) is too long and exceeds limit of 4096 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I noticed that the following block (https://github.com/vllm-project/vllm/blob/main/vllm/core/scheduler.py#L193) was added to vllm core scheduler
as a fix for the issue #113
I wonder why we're not using
num_prompt_tokens > self.scheduler_config.max_seq_len
as a condition? It seems to filter out input sequences with exact length (e.g.--input-len 1024
) for benchmarks.Thank you for your instruction!
Beta Was this translation helpful? Give feedback.
All reactions