-
Notifications
You must be signed in to change notification settings - Fork 52
Add unique prefix - increasing counter #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I am trying to figure out lint errors. When i run it locally they all seemed to have passed. :) ruff check --fix tests/unit/dataset/test_synthetic.py |
End to end test: Ran following command for inference server running llama command: ` VLLM output: |
📦 Build Artifacts Available |
|
The SyntheticTextItemsGenerator was generating prompts that could trigger vLLM's automatic prefix caching, leading to hitting the prefix cache up to 80% in some cases during the performance benchmarking.
Implemented unique prefix injection to guarantee 0% prefix cache hit rate while maintaining realistic prompt characteristics.
Test:
Performing some tests on the H200 target accelerator to confirm the fix.