Skip to content

[Platform][Worker][ModelRunner] Add LoRA & Multi-LoRA support #521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025

Conversation

paulyu12
Copy link
Collaborator

What this PR does / why we need it?

According to this RFC [RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396 and this vLLM Ascend Roadmap Q2 2025 #448, we pull request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA Dynamic Serving.

LoRA reference is here: LoRA reference

Does this PR introduce any user-facing change?

Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?

git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: paulyu <paulyu0307@gmail.com>
paulyu12 and others added 2 commits April 17, 2025 14:51
Signed-off-by: paulyu <paulyu0307@gmail.com>
Signed-off-by: paulyu <paulyu0307@gmail.com>
@wangxiyuan
Copy link
Collaborator

Thanks for your contribution. I'll merge this feature first. Let's improve the performance or other problems in the following PR.

@wangxiyuan wangxiyuan merged commit 697908f into vllm-project:main Apr 17, 2025
17 of 27 checks passed
ttanzhiqiang pushed a commit to ttanzhiqiang/vllm-ascend that referenced this pull request Apr 27, 2025
…roject#521)

### What this PR does / why we need it?
According to this RFC [[RFC]: Join the MultiLora and MultiLora Dynammic
Serving feature develop
vllm-project#396](vllm-project#396) and this
[vLLM Ascend Roadmap Q2 2025
vllm-project#448](vllm-project#448), we pull
request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA
Dynamic Serving.

LoRA reference is here: [LoRA
reference](https://docs.vllm.ai/en/latest/features/lora.html)

### Does this PR introduce _any_ user-facing change?

Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

### How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

---------

Signed-off-by: paulyu <paulyu0307@gmail.com>
Co-authored-by: paulyu <paulyu0307@gmail.com>
paulyu12 added a commit to paulyu12/vllm-ascend that referenced this pull request Apr 28, 2025
…roject#521)

### What this PR does / why we need it?
According to this RFC [[RFC]: Join the MultiLora and MultiLora Dynammic
Serving feature develop
vllm-project#396](vllm-project#396) and this
[vLLM Ascend Roadmap Q2 2025
vllm-project#448](vllm-project#448), we pull
request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA
Dynamic Serving.

LoRA reference is here: [LoRA
reference](https://docs.vllm.ai/en/latest/features/lora.html)

### Does this PR introduce _any_ user-facing change?

Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

### How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

---------

Signed-off-by: paulyu <paulyu0307@gmail.com>
Co-authored-by: paulyu <paulyu0307@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants