You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GuideLLM currently enables to receive HF originated DSs, path to local DSs, or synthetic DSs.
In order to demonstrate the strengths of KVCache-aware routing, we need to be able to easily create DSs that represent use-cases that bring the highest value and leverage the advantages of this kind of routing, such as RAG-based apps and agentic apps.
The Plan
Create a DS generation engine that will receive use-case requirements as parameters and return a guideLLM-ready full DS that matches the use case. The DS will be ready to fed as the --data parameter in GuideLLM Benchmark without any changes.
High-level steps of implementation
The engine will include 2 consecutive layers:
First layer will receive requirements as parameters (i.e. number of different apps, system prompt length, tools length, RAG docs length, RAG docs number per app etc.), and eventually will return a json file, containing all the Apps required by the user, in a textual human-understandable form, where all length are completely configurable
simplified example:
Second layer will receive the output json of the 1st layer as input, along with use-case related parameters (i.e. number of users, number of request per user-session etc, num of users that share the same App, num of documents per user etc.), and then compress and flatten it to guideLLM-ready prompt based DS, in a way that will take the use-case into consideration.
i.e. 10 users, each 2 users use the same App from which they use 2 documents - the layer will create couples of consecutive prompts sharing the same App's system-prompt an tools, and differing only in the RAG docs chosen (maybe) and in the user-prompt.
@SharonGil I like the idea of an extension to enable RAG-style use cases, especially as it relates to the piece you mentioned here for testing KV Cache setups. I'd recommend rather than going to JSON, though, we build this functionality in natively either by extending the SyntheticDatasetCreator class and SyntheticDatasetConfig, which would take in an optional type for the synthetic data to create and create the desired output formats. Either that, or we add in a new DatasetCreator for this specific use case and we can route it based on a type passed in the input args config for the --data parameter. This way we can avoid going to the intermediate JSON and enable a more portable and dynamic solution for command sharing and reproducibility.
If you have an example of what that generator would look like, I can help advise on how I think it would fit in easiest to the existing DatasetCreator flows.
Also, one other note we'll need to consider with these is that there isn't any assumption currently on session / user based requests at the data / load generation level. So, we'll likely need to add some logic there so we can better ensure we are simulating those users better and we aren't sending a multiple requests from this concept of a single user at once. This could be a follow up or a feature we leave open for the future, though
Motivation
GuideLLM currently enables to receive HF originated DSs, path to local DSs, or synthetic DSs.
In order to demonstrate the strengths of KVCache-aware routing, we need to be able to easily create DSs that represent use-cases that bring the highest value and leverage the advantages of this kind of routing, such as RAG-based apps and agentic apps.
The Plan
Create a DS generation engine that will receive use-case requirements as parameters and return a guideLLM-ready full DS that matches the use case. The DS will be ready to fed as the --data parameter in GuideLLM Benchmark without any changes.
High-level steps of implementation
The engine will include 2 consecutive layers:
simplified example:
{ "systemPrompt": "8DzB0vXMMDO1ihCpCNsEBDH2FrHfmnR", "tools": "iSvQglvUQgoapyEWuYjNvgrqRR8DeX6zH6vQfQoC0OSSzcafs1XHHHLnxYS9O", "ragDocs": [ "Iwat4dvnPdrmsLhYEP8RTsR9Es1kc4MI0wIfsFG55" "0xYplap6ennnt6nlhBFMjlJTHNU8kW68JhaHY6TK" ] }
i.e. 10 users, each 2 users use the same App from which they use 2 documents - the layer will create couples of consecutive prompts sharing the same App's system-prompt an tools, and differing only in the RAG docs chosen (maybe) and in the user-prompt.
Link to issue in Distributed-KV-Cache repo - llm-d/llm-d-kv-cache-manager#4
The text was updated successfully, but these errors were encountered: