-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
With middle size(30B-70B) LLM, not all requests benefit from PD disaggregation (e.g. input tokens <= 128, output tokens >=512). In this case, remote prefill with conditional disaggregation is a nice to have feature.
Related resources
Dynamo implement remote prefill in vLLM v0/v1
Swipe4057, jokerwyt, JustinTong0323, ByronHsu, wwl2755 and 2 more
Metadata
Metadata
Assignees
Labels
No labels