-
Notifications
You must be signed in to change notification settings - Fork 480
add single request test case for aclgraph #3392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add single request test case for aclgraph #3392
Conversation
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new end-to-end test for aclgraph with data parallelism. The test is well-structured, but there's a critical issue in how the test server is initialized that will prevent the test from running. I've also identified a potential source of test flakiness due to a hardcoded port and provided a comprehensive suggestion to fix both issues, making the test more robust.
env_dict = { | ||
"TASK_QUEUE_ENABLE": "1", | ||
"HCCL_OP_EXPANSION_MODE": "AIV", | ||
} | ||
server_args = [ | ||
"--no-enable-prefix-caching", "--tensor-parallel-size", | ||
"1", "--data-parallel-size", str(dp_size), "--port", "20002", | ||
"--trust-remote-code", "--gpu-memory-utilization", "0.9" | ||
] | ||
request_keyword_args: dict[str, Any] = { | ||
**api_keyword_args, | ||
} | ||
with RemoteOpenAIServer(model, | ||
server_args, | ||
server_port=20002, | ||
env_dict=env_dict, | ||
auto_port=False) as server: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of issues with the RemoteOpenAIServer
setup:
-
Incorrect
RemoteOpenAIServer
call (Critical): The current call toRemoteOpenAIServer
is incorrect. The__init__
signature intests/e2e/conftest.py
is(self, model: str, server_host: str, server_port: int, vllm_serve_args: list[str], ...)
. Your call providesserver_args
(a list) as the second positional argument forserver_host
(a string), and misses the requiredvllm_serve_args
argument. This will cause aTypeError
at runtime. -
Hardcoded port (High): The test uses a hardcoded port
20002
. This can cause flaky tests due to port conflicts when running tests in parallel or if the port is already occupied. It's better to use a dynamically allocated port.
Here is a suggested fix that addresses both issues by using vllm.utils.get_open_port
to dynamically allocate a port and corrects the RemoteOpenAIServer
call. The import is added inside the function for simplicity of the suggestion.
from vllm.utils import get_open_port
port = get_open_port()
env_dict = {
"TASK_QUEUE_ENABLE": "1",
"HCCL_OP_EXPANSION_MODE": "AIV",
}
server_args = [
"--no-enable-prefix-caching", "--tensor-parallel-size",
"1", "--data-parallel-size", str(dp_size), "--port", str(port),
"--trust-remote-code", "--gpu-memory-utilization", "0.9"
]
request_keyword_args: dict[str, Any] = {
**api_keyword_args,
}
with RemoteOpenAIServer(model=model,
server_host="localhost",
server_port=port,
vllm_serve_args=server_args,
env_dict=env_dict,
auto_port=False) as server:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lilinsiman Please check these suggested changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lilinsiman Please check these suggested changes.
After checking, it was found that the suggested modification was indeed better. Calling the get_open_port
function can automatically return an available port, ensuring that the port is not occupied and thus guaranteeing its availability.
d6afb6e
to
279e6db
Compare
please enable this test in .github workflow |
@pytest.mark.asyncio | ||
@pytest.mark.parametrize("model", MODELS) | ||
@pytest.mark.parametrize("dp_size", DATA_PARALLELS) | ||
async def test_models(model: str, dp_size: int) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the case name
server_args = [ | ||
"--no-enable-prefix-caching", "--tensor-parallel-size", "1", | ||
"--data-parallel-size", | ||
str(dp_size), "--port", "20002", "--trust-remote-code", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete port code
279e6db
to
e73fd45
Compare
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
e73fd45
to
407fb7a
Compare
What this PR does / why we need it?
This pr adds online single request DP2 test case for aclgraph
Does this PR introduce any user-facing change?
no
How was this patch tested?
ut