why max_num_batched_tokens must be <= 65528 when LoRA is enabled? can it be extended longer #6247

junior-zsy · 2024-07-09T07:58:16Z

junior-zsy
Jul 9, 2024

When I used LoRA, I found that Due to limitations of the custom LoRA CUDA kernel, max_num_batched_tokens must be <= 65528 when LoRA is enabled. ，I would like to know the specific reason for this. Can it be extended for a longer period of time? I have a need in this area. Do you have any specific ideas for me? Thank you @Yard1 @simon-mo

jeejeelee · 2024-08-26T08:20:28Z

jeejeelee
Aug 26, 2024
Collaborator

Version 0.5.5 has already removed this restriction, see: #7288.

2 replies

brando90 Oct 2, 2024

@jeejeelee what version of pytorch and flash-attn do you have?

brando90 Oct 3, 2024

this worked for me

- Recommended vllm (it works with lora adapters)

pip install torch==2.4.0
pip install vllm==0.5.5

make sure the local libs are installed

pip install -e ~/ultimate-utils --no-deps

make sure you really have the right torch and vllm version

pip install torch==2.4.0
pip install vllm==0.5.5

test vllm lora

save env now

pip freeze > ~/ultimate-utils/requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

why max_num_batched_tokens must be <= 65528 when LoRA is enabled? can it be extended longer #6247

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

why max_num_batched_tokens must be <= 65528 when LoRA is enabled? can it be extended longer #6247

Uh oh!

junior-zsy Jul 9, 2024

Replies: 1 comment · 2 replies

Uh oh!

jeejeelee Aug 26, 2024 Collaborator

Uh oh!

brando90 Oct 2, 2024

Uh oh!

brando90 Oct 3, 2024

- Recommended vllm (it works with lora adapters)

make sure the local libs are installed

make sure you really have the right torch and vllm version

test vllm lora

save env now

junior-zsy
Jul 9, 2024

Replies: 1 comment 2 replies

jeejeelee
Aug 26, 2024
Collaborator