VLLM output not complete #1095

RickyGunawan09 · 2023-09-19T03:47:17Z

RickyGunawan09
Sep 19, 2023

hai guys,
thank you for making this super library.
i have a question about the output of vllm

i'm using GPU RTX A6000 50GB cuda 12 with model Vicuna13B-v1.5-4k from lmsys
vllm is serve with gpu_memory_utilization 0.8
the parameter that i change for request is:

max_token 4096
temperature 0
i'm make custom prompt with context from text/document.

why sometimes the output is not complete ?

zhiqiangzhongddu · 2023-10-10T21:38:59Z

zhiqiangzhongddu
Oct 10, 2023

+1, my obtained answers are always not complete. Even shorter than a sentence.

0 replies

zhiqiangzhongddu · 2023-10-11T09:41:53Z

zhiqiangzhongddu
Oct 11, 2023

Hi this might be helpful for you. You can set the length of output to get complete answers.

vllm/vllm/sampling_params.py

Line 61 in ee8217e

max_tokens: Maximum number of tokens to generate per output sequence.

1 reply

jayachandrakalakutagar Jan 29, 2024

Still broken responses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

VLLM output not complete #1095

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

VLLM output not complete #1095

Uh oh!

RickyGunawan09 Sep 19, 2023

Replies: 2 comments · 1 reply

Uh oh!

zhiqiangzhongddu Oct 10, 2023

Uh oh!

zhiqiangzhongddu Oct 11, 2023

Uh oh!

jayachandrakalakutagar Jan 29, 2024

RickyGunawan09
Sep 19, 2023

Replies: 2 comments 1 reply

zhiqiangzhongddu
Oct 10, 2023

zhiqiangzhongddu
Oct 11, 2023