Skip to content

Have GuideLLM kick off a vLLM server automatically to avoid having the user install vLLM and assign the target themselves #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rgreenberg1 opened this issue Mar 9, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@rgreenberg1
Copy link
Collaborator

Description:
The proposal here is to change the architecture of how GuideLLM runs so that when a user runs GuideLLM it automatically kicks off a vLLM server and supplies GuideLLM with the necessary data to run the benchmark. This also covers adding any necessary pass-through parameters to vLLM to GuideLLM so that the user can just run GuideLLM to do a full end-to-end benchmark on a model. This is a UX enhancement.

Acceptance Criteria:

  • Enable GuideLLM to kick off a vLLM server when GuideLLM is run
  • Enable GuideLLM to accept the necessary pass-through arguments that need to go to vLLM:
    -- model (required)
    -- port (optional)
    -- TBD
@dagrayvid
Copy link
Collaborator

dagrayvid commented Mar 10, 2025

I'm not sure that we should add orchestration of the vLLM server to the scope of GuideLLM, for a few reasons:

  • Want the tool to stay agnostic of the runtime engine, we may want to add different backends to support other runtime engines.
  • Don't want the tool to be opinionated about the platform that the runtime engine is deployed on. If we wanted to support model deployment we would likely want to support k8s, KServe, Podman, bare-metal, etc...
  • Want to avoid scope-creep and keep GuideLLM focused on it's current focus which is doing load tests against a pre-deployed model endpoint.

Alternatives:

  • A separate project focused on simplifying model deployment that supports all platforms we care about (k8s Deployment, kserve, Podman, local [vllm serve]). We could even try to keep a set of "known-good" model configurations in that repo
  • Refer users to existing model deployment guides / mechanisms.

@rgreenberg1 rgreenberg1 moved this to Backlog in GuideLLM Kanban Board May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

3 participants