Skip to content

Enable Batch Inferencing Benchmarking Support #102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rgreenberg1 opened this issue Mar 31, 2025 · 0 comments
Open

Enable Batch Inferencing Benchmarking Support #102

rgreenberg1 opened this issue Mar 31, 2025 · 0 comments

Comments

@rgreenberg1
Copy link
Collaborator

Purpose:
The purpose of this ticket is to support batch inference requests for GuideLLM. We will need to add in a batch-size parameter to enable the user to dictate the batch size of inference requests.

We will want to add non-streaming batched inferencing like this PR from llm-load-test openshift-psap/llm-load-test#66

Acceptance Criteria:

  • Add a batch-size parameter in for the user to dictate the batch size of inference requests
  • This should be non-streaming batching
  • Once set, send requests in batches as dictated by the parameter
    Example:
    --batch-size=16, --batch-size=32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready
Development

No branches or pull requests

2 participants