-
Notifications
You must be signed in to change notification settings - Fork 16
Pd support #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Pd support #94
Conversation
…ecode fields Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Update config_test to use a function to create configuration same as defined in the config yaml file Signed-off-by: Maya Barnea <mayab@il.ibm.com>
change command line argument name to 'kv-cache-transfer-latency' Signed-off-by: Maya Barnea <mayab@il.ibm.com>
) | ||
|
||
DescribeTable("invalid configurations", | ||
func(args []string) { | ||
_, err := createSimConfig(args) | ||
Expect(err).To(HaveOccurred()) | ||
}, | ||
Entry(tests[7].name, tests[7].args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed a test here instead of only increasing the indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test 13
|
||
return c | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is confusing, and one function is enough. There is only one case for createDefaultBasicConfig, and we can just update the lora parameters in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basicConfig function was removed
// isDoRemoteDecode() returns true is do_remote_decode is true in the request, this means that this is prefill request | ||
doRemoteDecode() bool | ||
// isDoRemotePrefill() returns true is do_remote_prefill is true in the request, this means that this is decode request | ||
doRemotePrefill() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names in the comments don't match the actual names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pkg/llm-d-inference-sim/request.go
Outdated
RemoteBlockIds []string `json:"remote_block_ids"` | ||
RemoteEngineId string `json:"remote_engine_id"` | ||
RemoteHost string `json:"remote_host"` | ||
RemotePort int `json:"remote_port"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding comments for the fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
pkg/llm-d-inference-sim/response.go
Outdated
RemoteBlockIds []string `json:"remote_block_ids"` | ||
RemoteEngineId string `json:"remote_engine_id"` | ||
RemoteHost string `json:"remote_host"` | ||
RemotePort int `json:"remote_port"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -304,6 +306,8 @@ func (s *VllmSimulator) readRequest(ctx *fasthttp.RequestCtx, isChatCompletion b | |||
var req textCompletionRequest | |||
|
|||
err := json.Unmarshal(ctx.Request.Body(), &req) | |||
|
|||
fmt.Printf("Unmarshaled text request: %#v\n", req) | |||
return &req, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug printing?
pkg/llm-d-inference-sim/simulator.go
Outdated
@@ -477,16 +491,23 @@ func (s *VllmSimulator) reqProcessingWorker(ctx context.Context, id int) { | |||
isChatCompletion: reqCtx.isChatCompletion, | |||
model: displayModel, | |||
}, | |||
responseTokens, toolCalls, finishReason, usageDataToSend, | |||
responseTokens, toolCalls, finishReason, usageDataToSend, req.doRemotePrefill(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add doRemotePrefill to streamingContext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, added
pkg/llm-d-inference-sim/simulator.go
Outdated
@@ -638,6 +671,14 @@ func (s *VllmSimulator) sendResponse(isChatCompletion bool, ctx *fasthttp.Reques | |||
s.responseSentCallback(modelName) | |||
} | |||
|
|||
// returns time to first token based on whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on whether what? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
No description provided.