Skip to content

Feat/max error rate - continued #238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AlonKellner-RedHat
Copy link

@AlonKellner-RedHat AlonKellner-RedHat commented Jul 23, 2025

@markurtz This is a continuation of the work done in #171 by @markVaykhansky with the review comments fixed and a few additions.

The main differences to the original PR are:

  1. E2E tests of the new features, using vLLM simulator.
  2. 3 new fields in the output report run_stats: window_error_rate, termination_reason and status.
  3. A simplification of the windowing mechanism to be "chunked", now simply checks every GUIDELLM__ERROR_CHECK_WINDOW_SIZE completed requests (success/error) if the ratio of errors (or error count if --max-error>1) in those requests is greater than --max-error.

E2E tests

By default, these will check if the vLLM simulator is available in the local environment, if not - they are skipped and log a warning with the command required to build the vLLM simulator.
The vLLM simulator is built from source in the tests/e2e/vllm-sim.Dockerfile and extracted to bin/llm-d-inference-sim.
The tests use a class VllmSimServer that wraps the vLLM simulator with start and stop methods, then they call the guidellm command and assert the values generated in the output report.
The 2 test modules have varying run times:

  • test_successful_benchmark (2 tests) - 14.12s
  • test_max_error_benchmark (1 test) - 15.98s

The max error test takes longer since it allows guidellm to start sending requests before killing the vLLM sim, and then waits for the guidellm command to stop gracefully.
The successful tests share a single vLLM sim instance, hence are more efficient.

The new run_stats fields

  1. window_error_rate - the error rate within the error check window at the time of run completion (may be a partial window if the --max-error was not reached.). This is the value that would indicate why the --max-error was reached, as opposed to the error_rate field that is global and does not necessarily reflect that the --max-error was reached.
  2. termination_reason - either "max_seconds_reached", "max_requests_reached", "max_error_reached" or "interrupted". This is important for future features such as over-saturation termination or target metric margin termination.
  3. status - either "success", "error" or "interrupted". This is a simplification of the termination_reason field, will help differentiate success states from error states, for example over-saturation termination is an error state and target metric margin termination is a success state.

"Chunked" error checking

The precise logic is as follows:

  1. Accumulate the total amount of completed (errored/successful) requests and total amount of errored requests.
  2. When the total amount of completed requests reached GUIDELLM__ERROR_CHECK_WINDOW_SIZE, check if max_error is reached.
  3. If the max_error>1 - check if the accumulated amount of errored requests > max_error. Otherwise - check if the accumulated errored requests divided by the accumulated completed requests > max_error.
  4. If max_error is reached, break and shutdown.
  5. If max_error is not reached, reset the accumulated amounts to 0 and keep going.

This is simple and handles all different cases nicely.

@AlonKellner-RedHat AlonKellner-RedHat changed the base branch from feat/max-error-rate to main July 23, 2025 10:11
Co-authored-by: Mark Vakhansky <mvakhans@redhat.com>

Signed-off-by: AlonKellner-RedHat <akellner@redhat.com>
Signed-off-by: AlonKellner-RedHat <akellner@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant