Enable Batch Inferencing Benchmarking Support #102

rgreenberg1 · 2025-03-31T02:35:17Z

Purpose:
The purpose of this ticket is to support batch inference requests for GuideLLM. We will need to add in a batch-size parameter to enable the user to dictate the batch size of inference requests.

We will want to add non-streaming batched inferencing like this PR from llm-load-test openshift-psap/llm-load-test#66

Acceptance Criteria:

Add a batch-size parameter in for the user to dictate the batch size of inference requests
This should be non-streaming batching
Once set, send requests in batches as dictated by the parameter
Example:
--batch-size=16, --batch-size=32

The text was updated successfully, but these errors were encountered:

rgreenberg1 added the priority-high label Mar 31, 2025

rgreenberg1 added this to GuideLLM Kanban Board Mar 31, 2025

dagrayvid added priority-low and removed priority-high labels Apr 24, 2025

rgreenberg1 moved this to Ready in GuideLLM Kanban Board May 8, 2025

rgreenberg1 added priority-medium and removed priority-low labels May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Batch Inferencing Benchmarking Support #102

Enable Batch Inferencing Benchmarking Support #102

rgreenberg1 commented Mar 31, 2025

Enable Batch Inferencing Benchmarking Support #102

Enable Batch Inferencing Benchmarking Support #102

Comments

rgreenberg1 commented Mar 31, 2025