-
Notifications
You must be signed in to change notification settings - Fork 16
Pd support #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mayabar
wants to merge
7
commits into
llm-d:main
Choose a base branch
from
mayabar:pd-support
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+157
−72
Open
Pd support #94
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
10d5088
Add P/D support, respond accordingly to doRemotePrefill and doRemoteD…
mayabar 276f15a
Add test for kvcache transfer time command line parameter.
mayabar 626c432
Update readme file
mayabar 98b9585
fixes according PR's comments
mayabar 651c3b0
added comments for fields
mayabar d061f72
fix utils_test - initialize random before
mayabar 3c699b8
fixes in readme according the PR review
mayabar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
port: 8001 | ||
model: "Qwen/Qwen2-0.5B" | ||
max-num-seqs: 5 | ||
mode: "random" | ||
time-to-first-token: 2000 | ||
inter-token-latency: 1000 | ||
kv-cache-transfer-latency: 100 | ||
seed: 100100100 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,6 +44,10 @@ type completionRequest interface { | |
getToolChoice() string | ||
// getMaxCompletionTokens returns the maximum completion tokens requested | ||
getMaxCompletionTokens() *int64 | ||
// doRemoteDecode() returns true if do_remote_decode field is true in the request, this means that this is prefill request | ||
doRemoteDecode() bool | ||
// doRemotePrefill() returns true if do_remote_prefill field is true in the request, this means that this is decode request | ||
doRemotePrefill() bool | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The names in the comments don't match the actual names There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
} | ||
|
||
// baseCompletionRequest contains base completion request related information | ||
|
@@ -54,6 +58,18 @@ type baseCompletionRequest struct { | |
StreamOptions streamOptions `json:"stream_options"` | ||
// Model defines Model name to use for "inference", could be base Model name or one of available LoRA adapters | ||
Model string `json:"model"` | ||
// DoRemoteDecode boolean value, true when request's decode will be done on remote pod | ||
DoRemoteDecode bool `json:"do_remote_decode"` | ||
// DoRemotePrefill boolean value, true when request's prefill was done on remote pod | ||
DoRemotePrefill bool `json:"do_remote_prefill"` | ||
// RemoteBlockIds is a list of block identifiers to process remotely for distributed decoding | ||
RemoteBlockIds []string `json:"remote_block_ids"` | ||
// RemoteEngineId is an identifier of the remote inference engine or backend to use for processing requests | ||
RemoteEngineId string `json:"remote_engine_id"` | ||
// RemoteHost is a hostname or IP address of the remote server handling prefill | ||
RemoteHost string `json:"remote_host"` | ||
// RemotePort is a port of the remote server handling prefill | ||
RemotePort int `json:"remote_port"` | ||
} | ||
|
||
// StreamOptions defines streaming options for streaming requests | ||
|
@@ -74,6 +90,14 @@ func (b *baseCompletionRequest) includeUsage() bool { | |
return !b.Stream || b.StreamOptions.IncludeUsage | ||
} | ||
|
||
func (b *baseCompletionRequest) doRemoteDecode() bool { | ||
return b.DoRemoteDecode | ||
} | ||
|
||
func (b *baseCompletionRequest) doRemotePrefill() bool { | ||
return b.DoRemotePrefill | ||
} | ||
|
||
// completionReqCtx is a context passed in the simulator's flow, it contains the request data needed | ||
// to generate the simulator's response | ||
type completionReqCtx struct { | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed a test here instead of only increasing the indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test 13