VLLM output not complete #1095
RickyGunawan09
announced in
Q&A
Replies: 2 comments 1 reply
-
+1, my obtained answers are always not complete. Even shorter than a sentence. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi this might be helpful for you. You can set the length of output to get complete answers. Line 61 in ee8217e |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
hai guys,
thank you for making this super library.
i have a question about the output of vllm
i'm using GPU RTX A6000 50GB cuda 12 with model Vicuna13B-v1.5-4k from lmsys
vllm is serve with gpu_memory_utilization 0.8
the parameter that i change for request is:
max_token 4096
temperature 0
i'm make custom prompt with context from text/document.
why sometimes the output is not complete ?
Beta Was this translation helpful? Give feedback.
All reactions