How to Truncate the input prompt? #2443
Closed
AamodThakur
announced in
Q&A
Replies: 2 comments 5 replies
-
Same question. |
Beta Was this translation helpful? Give feedback.
0 replies
-
You can encode prompt with tokenizer and call llm = LLM(model="lmsys/vicuna-7b-v1.5", max_model_len=4096, max_num_batched_tokens = 4096, tensor_parallel_size=2)
tokenizer = llm.get_tokenizer()
prompt_token_ids = tokenizer.encode("<PROMPT>", return_tensors="pt")
# Truncate prompt_token_ids
prompt_token_ids = prompt_token_ids[-MAX_PROMPT_TOKEN:]
llm.generate(prompt_token_ids=prompt_token_ids) |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We are using vicuna model and setting the maximum length to 4096.
llm = LLM(model="lmsys/vicuna-7b-v1.5", max_model_len=4096, max_num_batched_tokens = 4096, tensor_parallel_size=2)
We are sending input prompt of more than 10K, and wanted it to be truncated to 4096. But we are getting erorr "Input prompt (25597 tokens) is too long and exceeds limit of 4096".
How can we set truncate to true in vllm?
Output of get tokenizer:
LlamaTokenizerFast(name_or_path='lmsys/vicuna-7b-v1.5', vocab_size=32000, model_max_length=4096, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<unk>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), }
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions