[RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396

ZhengJun9 · 2025-03-26T03:22:44Z

Motivation.

We would like to join the MultiLora and MultiLora Dynammic Serving feature develop.

Proposed Change.

We want to implement the following：
1、The MultiLora usage should be the same as vllm, see https://docs.vllm.ai/en/latest/features/lora.html
2、What's more, for production environments use, the dynamically serving LoRA Adapters should be persistence. That means when the docker is restarted in some case, the load/unload lora adapters should not roll back to the initial state.

Feedback Period.

we plan to finnish this work in two weeks

CC List.

Yikun
wangxiyuan

Any Other Things.

No response

### What this PR does / why we need it? According to this RFC [[RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396](#396) and this [vLLM Ascend Roadmap Q2 2025 #448](#448), we pull request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA Dynamic Serving. LoRA reference is here: [LoRA reference](https://docs.vllm.ai/en/latest/features/lora.html) ### Does this PR introduce _any_ user-facing change? Following openai HTTP apis will be supported: /v1/load_lora_adapter /v1/unload_lora_adapter ### How was this patch tested? git clone https://github.com/vllm-project/vllm.git cd vllm/examples/offline_inference/ && python3 multilora_inference.py --------- Signed-off-by: paulyu <paulyu0307@gmail.com> Co-authored-by: paulyu <paulyu0307@gmail.com>

Yikun · 2025-04-26T01:33:51Z

According to @paulyu12 feedback, they are working on MultiLora on V1, will ready soon

…roject#521) ### What this PR does / why we need it? According to this RFC [[RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop vllm-project#396](vllm-project#396) and this [vLLM Ascend Roadmap Q2 2025 vllm-project#448](vllm-project#448), we pull request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA Dynamic Serving. LoRA reference is here: [LoRA reference](https://docs.vllm.ai/en/latest/features/lora.html) ### Does this PR introduce _any_ user-facing change? Following openai HTTP apis will be supported: /v1/load_lora_adapter /v1/unload_lora_adapter ### How was this patch tested? git clone https://github.com/vllm-project/vllm.git cd vllm/examples/offline_inference/ && python3 multilora_inference.py --------- Signed-off-by: paulyu <paulyu0307@gmail.com> Co-authored-by: paulyu <paulyu0307@gmail.com>

### What this PR does / why we need it? According to this [RFC]( #396 ) and [this](#448), we pull request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA Dynamic Serving. LoRA reference is here: [LoRA reference](https://docs.vllm.ai/en/latest/features/lora.html) ### Does this PR introduce _any_ user-facing change? Following openai HTTP apis will be supported: /v1/load_lora_adapter /v1/unload_lora_adapter ### How was this patch tested? git clone [https://github.com/vllm-project/vllm.git](https://github.com/vllm-project/vllm.git) cd vllm/examples/offline_inference/ && python3 multilora_inference.py > [[Release]: vLLM Ascend v0.7.3 release checklist ](#644 (comment)) --------- Signed-off-by: paulyu <paulyu0307@gmail.com> Co-authored-by: paulyu12 <507435917@qq.com> Co-authored-by: paulyu <paulyu0307@gmail.com>

What this PR does / why we need it? According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine Does this PR introduce any user-facing change? Following openai HTTP apis will be supported: /v1/load_lora_adapter /v1/unload_lora_adapter How was this patch tested? git clone https://github.com/vllm-project/vllm.git cd vllm/examples/offline_inference/ && python3 multilora_inference.py Signed-off-by: jesse <szxfml@gmail.com>

paulyu12 · 2025-05-20T11:21:40Z

V1 Engine LoRA is ready to be merged ( #893 ). So we close this RFC.

ZhengJun9 changed the title ~~[RFC]:~~ [RFC]:join the MultiLora and MultiLora Dynammic Serving feature develop Mar 26, 2025

ZhengJun9 changed the title ~~[RFC]:join the MultiLora and MultiLora Dynammic Serving feature develop~~ [RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop Mar 26, 2025

Yikun mentioned this issue Mar 31, 2025

vLLM Ascend Roadmap Q2 2025 #448

Open

38 tasks

paulyu12 mentioned this issue Apr 14, 2025

[Platform][Worker][ModelRunner] Add LoRA & Multi-LoRA support #521

Merged

ZhengJun9 mentioned this issue Apr 28, 2025

Add LoRA & Multi-LoRA support for V0.7.3 dev by Cherry Pick #700

Merged

jesse996 mentioned this issue May 9, 2025

add V1 Engine LoRA support #801

Closed

Yikun added the RFC Request For Comments label May 11, 2025

wangxiyuan mentioned this issue May 14, 2025

[Guide] Official Guide Index #840

Open

paulyu12 closed this as completed May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396

[RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396

ZhengJun9 commented Mar 26, 2025 •

edited

Loading

Yikun commented Apr 26, 2025

paulyu12 commented May 20, 2025

[RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396

[RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396

Comments

ZhengJun9 commented Mar 26, 2025 • edited Loading

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Yikun commented Apr 26, 2025

paulyu12 commented May 20, 2025

ZhengJun9 commented Mar 26, 2025 •

edited

Loading