Skip to content

[RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ZhengJun9 opened this issue Mar 26, 2025 · 2 comments
Closed
Labels
RFC Request For Comments

Comments

@ZhengJun9
Copy link

ZhengJun9 commented Mar 26, 2025

Motivation.

We would like to join the MultiLora and MultiLora Dynammic Serving feature develop.

Proposed Change.

We want to implement the following:
1、The MultiLora usage should be the same as vllm, see https://docs.vllm.ai/en/latest/features/lora.html
2、What's more, for production environments use, the dynamically serving LoRA Adapters should be persistence. That means when the docker is restarted in some case, the load/unload lora adapters should not roll back to the initial state.

Feedback Period.

we plan to finnish this work in two weeks

CC List.

Yikun
wangxiyuan

Any Other Things.

No response

@ZhengJun9 ZhengJun9 changed the title [RFC]: [RFC]:join the MultiLora and MultiLora Dynammic Serving feature develop Mar 26, 2025
@ZhengJun9 ZhengJun9 changed the title [RFC]:join the MultiLora and MultiLora Dynammic Serving feature develop [RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop Mar 26, 2025
wangxiyuan pushed a commit that referenced this issue Apr 17, 2025
### What this PR does / why we need it?
According to this RFC [[RFC]: Join the MultiLora and MultiLora Dynammic
Serving feature develop
#396](#396) and this
[vLLM Ascend Roadmap Q2 2025
#448](#448), we pull
request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA
Dynamic Serving.

LoRA reference is here: [LoRA
reference](https://docs.vllm.ai/en/latest/features/lora.html)

### Does this PR introduce _any_ user-facing change?

Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

### How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

---------

Signed-off-by: paulyu <paulyu0307@gmail.com>
Co-authored-by: paulyu <paulyu0307@gmail.com>
@Yikun
Copy link
Collaborator

Yikun commented Apr 26, 2025

According to @paulyu12 feedback, they are working on MultiLora on V1, will ready soon

ttanzhiqiang pushed a commit to ttanzhiqiang/vllm-ascend that referenced this issue Apr 27, 2025
…roject#521)

### What this PR does / why we need it?
According to this RFC [[RFC]: Join the MultiLora and MultiLora Dynammic
Serving feature develop
vllm-project#396](vllm-project#396) and this
[vLLM Ascend Roadmap Q2 2025
vllm-project#448](vllm-project#448), we pull
request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA
Dynamic Serving.

LoRA reference is here: [LoRA
reference](https://docs.vllm.ai/en/latest/features/lora.html)

### Does this PR introduce _any_ user-facing change?

Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

### How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

---------

Signed-off-by: paulyu <paulyu0307@gmail.com>
Co-authored-by: paulyu <paulyu0307@gmail.com>
paulyu12 added a commit to paulyu12/vllm-ascend that referenced this issue Apr 28, 2025
…roject#521)

### What this PR does / why we need it?
According to this RFC [[RFC]: Join the MultiLora and MultiLora Dynammic
Serving feature develop
vllm-project#396](vllm-project#396) and this
[vLLM Ascend Roadmap Q2 2025
vllm-project#448](vllm-project#448), we pull
request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA
Dynamic Serving.

LoRA reference is here: [LoRA
reference](https://docs.vllm.ai/en/latest/features/lora.html)

### Does this PR introduce _any_ user-facing change?

Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

### How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

---------

Signed-off-by: paulyu <paulyu0307@gmail.com>
Co-authored-by: paulyu <paulyu0307@gmail.com>
ganyi1996ppo pushed a commit that referenced this issue Apr 28, 2025
### What this PR does / why we need it?
According to this [RFC](
#396 ) and
[this](#448), we pull
request relavant code to support (1) Multi-LoRA and (2) Multi-LoRA
Dynamic Serving.
LoRA reference is here: [LoRA
reference](https://docs.vllm.ai/en/latest/features/lora.html)

### Does this PR introduce _any_ user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

### How was this patch tested?
git clone
[https://github.com/vllm-project/vllm.git](https://github.com/vllm-project/vllm.git)
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

> [[Release]: vLLM Ascend v0.7.3 release checklist
](#644 (comment))

---------

Signed-off-by: paulyu <paulyu0307@gmail.com>
Co-authored-by: paulyu12 <507435917@qq.com>
Co-authored-by: paulyu <paulyu0307@gmail.com>
@Yikun Yikun added the RFC Request For Comments label May 11, 2025
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 12, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 12, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 12, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 12, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 12, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 12, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 12, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 13, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 13, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 13, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 13, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 14, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 14, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 14, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 14, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 14, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 14, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 14, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 15, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 15, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 15, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 15, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 15, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 16, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 16, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 16, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
jesse996 added a commit to jesse996/vllm-ascend that referenced this issue May 17, 2025
What this PR does / why we need it?
According to this RFC vllm-project#396 and this vllm-project#448, we pull request relavant code to support LoRA in v1 Engine

Does this PR introduce any user-facing change?
Following openai HTTP apis will be supported:
/v1/load_lora_adapter
/v1/unload_lora_adapter

How was this patch tested?
git clone https://github.com/vllm-project/vllm.git
cd vllm/examples/offline_inference/ && python3 multilora_inference.py

Signed-off-by: jesse <szxfml@gmail.com>
@paulyu12
Copy link
Collaborator

V1 Engine LoRA is ready to be merged ( #893 ). So we close this RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments
Projects
None yet
Development

No branches or pull requests

3 participants