Skip to content

Pd support #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Pd support #94

wants to merge 4 commits into from

Conversation

mayabar
Copy link
Collaborator

@mayabar mayabar commented Jul 14, 2025

No description provided.

mayabar added 3 commits July 14, 2025 16:08
…ecode fields

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Update config_test to use a function to create configuration same as defined in the config yaml file

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
change command line argument name to 'kv-cache-transfer-latency'

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
@mayabar mayabar requested review from shmuelk and irar2 July 14, 2025 13:12
)

DescribeTable("invalid configurations",
func(args []string) {
_, err := createSimConfig(args)
Expect(err).To(HaveOccurred())
},
Entry(tests[7].name, tests[7].args),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed a test here instead of only increasing the indices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test 13


return c
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing, and one function is enough. There is only one case for createDefaultBasicConfig, and we can just update the lora parameters in the test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basicConfig function was removed

// isDoRemoteDecode() returns true is do_remote_decode is true in the request, this means that this is prefill request
doRemoteDecode() bool
// isDoRemotePrefill() returns true is do_remote_prefill is true in the request, this means that this is decode request
doRemotePrefill() bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names in the comments don't match the actual names

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

RemoteBlockIds []string `json:"remote_block_ids"`
RemoteEngineId string `json:"remote_engine_id"`
RemoteHost string `json:"remote_host"`
RemotePort int `json:"remote_port"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding comments for the fields

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

RemoteBlockIds []string `json:"remote_block_ids"`
RemoteEngineId string `json:"remote_engine_id"`
RemoteHost string `json:"remote_host"`
RemotePort int `json:"remote_port"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -304,6 +306,8 @@ func (s *VllmSimulator) readRequest(ctx *fasthttp.RequestCtx, isChatCompletion b
var req textCompletionRequest

err := json.Unmarshal(ctx.Request.Body(), &req)

fmt.Printf("Unmarshaled text request: %#v\n", req)
return &req, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug printing?

@@ -477,16 +491,23 @@ func (s *VllmSimulator) reqProcessingWorker(ctx context.Context, id int) {
isChatCompletion: reqCtx.isChatCompletion,
model: displayModel,
},
responseTokens, toolCalls, finishReason, usageDataToSend,
responseTokens, toolCalls, finishReason, usageDataToSend, req.doRemotePrefill(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add doRemotePrefill to streamingContext?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, added

@@ -638,6 +671,14 @@ func (s *VllmSimulator) sendResponse(isChatCompletion bool, ctx *fasthttp.Reques
s.responseSentCallback(modelName)
}

// returns time to first token based on whether
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on whether what? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
@mayabar mayabar requested a review from irar2 July 15, 2025 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants