Skip to content

GuideLLM DS Generation Engine #26 #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SharonGil opened this issue Apr 24, 2025 · 1 comment
Open

GuideLLM DS Generation Engine #26 #134

SharonGil opened this issue Apr 24, 2025 · 1 comment

Comments

@SharonGil
Copy link

SharonGil commented Apr 24, 2025

Motivation

GuideLLM currently enables to receive HF originated DSs, path to local DSs, or synthetic DSs.
In order to demonstrate the strengths of KVCache-aware routing, we need to be able to easily create DSs that represent use-cases that bring the highest value and leverage the advantages of this kind of routing, such as RAG-based apps and agentic apps.

The Plan

Create a DS generation engine that will receive use-case requirements as parameters and return a guideLLM-ready full DS that matches the use case. The DS will be ready to fed as the --data parameter in GuideLLM Benchmark without any changes.

High-level steps of implementation

The engine will include 2 consecutive layers:

  1. First layer will receive requirements as parameters (i.e. number of different apps, system prompt length, tools length, RAG docs length, RAG docs number per app etc.), and eventually will return a json file, containing all the Apps required by the user, in a textual human-understandable form, where all length are completely configurable
    simplified example:

{ "systemPrompt": "8DzB0vXMMDO1ihCpCNsEBDH2FrHfmnR", "tools": "iSvQglvUQgoapyEWuYjNvgrqRR8DeX6zH6vQfQoC0OSSzcafs1XHHHLnxYS9O", "ragDocs": [ "Iwat4dvnPdrmsLhYEP8RTsR9Es1kc4MI0wIfsFG55" "0xYplap6ennnt6nlhBFMjlJTHNU8kW68JhaHY6TK" ] }

  1. Second layer will receive the output json of the 1st layer as input, along with use-case related parameters (i.e. number of users, number of request per user-session etc, num of users that share the same App, num of documents per user etc.), and then compress and flatten it to guideLLM-ready prompt based DS, in a way that will take the use-case into consideration.
    i.e. 10 users, each 2 users use the same App from which they use 2 documents - the layer will create couples of consecutive prompts sharing the same App's system-prompt an tools, and differing only in the RAG docs chosen (maybe) and in the user-prompt.

Link to issue in Distributed-KV-Cache repo - llm-d/llm-d-kv-cache-manager#4

@markurtz
Copy link
Member

@SharonGil I like the idea of an extension to enable RAG-style use cases, especially as it relates to the piece you mentioned here for testing KV Cache setups. I'd recommend rather than going to JSON, though, we build this functionality in natively either by extending the SyntheticDatasetCreator class and SyntheticDatasetConfig, which would take in an optional type for the synthetic data to create and create the desired output formats. Either that, or we add in a new DatasetCreator for this specific use case and we can route it based on a type passed in the input args config for the --data parameter. This way we can avoid going to the intermediate JSON and enable a more portable and dynamic solution for command sharing and reproducibility.

If you have an example of what that generator would look like, I can help advise on how I think it would fit in easiest to the existing DatasetCreator flows.

Also, one other note we'll need to consider with these is that there isn't any assumption currently on session / user based requests at the data / load generation level. So, we'll likely need to add some logic there so we can better ensure we are simulating those users better and we aren't sending a multiple requests from this concept of a single user at once. This could be a follow up or a feature we leave open for the future, though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

2 participants