-
Notifications
You must be signed in to change notification settings - Fork 108
[P/D] add acc test script of hpu pd disagg #1394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: habana_main
Are you sure you want to change the base?
Conversation
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new pipeline to run and compare non-disaggregated (baseline) and disaggregated accuracy tests against a vLLM-based model.
- Introduces a Python script (
test_disagg_accuracy.py
) to generate outputs, save baseline results, and verify exact matches in disagg mode. - Adds a Bash script (
run_hpu_disagg_accuracy_test.sh
) to orchestrate etcd, Mooncake, vLLM servers, and execute both test modes end-to-end. - Includes
mooncake.json
for MooncakeStoreConnector configuration.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
pd_xpyd/test_disagg_accuracy.py | New client script: sends prompts, writes baseline JSON, and asserts disaggregated outputs match |
pd_xpyd/run_hpu_disagg_accuracy_test.sh | Automation script: starts services, runs baseline/disagg tests, and cleans up |
pd_xpyd/mooncake.json | Configuration for etcd/Mooncake key-value store connector |
Comments suppressed due to low confidence (3)
pd_xpyd/test_disagg_accuracy.py:148
- In disagg mode you are reading the baseline file, so this should report a read error instead of "Error writing to file".
print(f"Error writing to file: {e}")
pd_xpyd/test_disagg_accuracy.py:69
- The docstring says "two optional string arguments" but
--service_url
and--model_name
are required. Update the docstring to reflect the actual behavior and all arguments.
"""
This script demonstrates how to accept two optional string arguments
pd_xpyd/run_hpu_disagg_accuracy_test.sh:23
- [nitpick] This variable uses lowercase naming while others are uppercase. Consider renaming to
MAX_NUM_BATCHED_TOKENS
for consistency with the script’s style.
max_num_batched_tokens=2048
do you think we can reuse this file? https://github.com/HabanaAI/vllm-fork/blob/habana_main/tests/kv_transfer/test_disagg.py |
Not very reusable. Here, I run the baseline and PD with one click, then compare whether the outputs are consistent. |
Currently, this script is being used as a demo for users or QA.
This PR adds disaggregated (disagg) vs non-disaggregated (baseline) accuracy tests. The changes include the addition of configuration files, a shell script to automate the test process, and a Python script to validate the outputs.