-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
Update vLLM Benchmark Suite for Xeon based on 0.9.2 release #21486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update vLLM Benchmark Suite for Xeon based on 0.9.2 release #21486
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the vLLM benchmark suite for Xeon CPUs by adding new benchmark configurations and parameters. My review identified a couple of critical issues that need to be addressed:
- A recurring typo in an environment variable (
VLLM_CPU_SGL_KERNE
instead ofVLLM_CPU_SGL_KERNEL
) across all new and modified benchmark configuration files. This would prevent the intended performance optimizations from taking effect. - A logical inconsistency in one of the new test configurations (
serving-tests-cpu-snc3.json
), where thepipeline_parallel_size
does not match the test name, which would lead to incorrect benchmark runs.
I've provided specific suggestions to correct these issues. Once these are fixed, the changes should correctly configure the benchmarks for the intended performance improvements.
"VLLM_RPC_TIMEOUT": 100000, | ||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | ||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | ||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
"VLLM_RPC_TIMEOUT": 100000, | ||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | ||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | ||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
"VLLM_RPC_TIMEOUT": 100000, | ||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | ||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | ||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
"VLLM_RPC_TIMEOUT": 100000, | ||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | ||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | ||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
"VLLM_RPC_TIMEOUT": 100000, | ||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | ||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | ||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
@@ -6,6 +6,7 @@ | |||
"VLLM_RPC_TIMEOUT": 100000, | |||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | |||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | |||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
@@ -36,6 +39,7 @@ | |||
"VLLM_RPC_TIMEOUT": 100000, | |||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | |||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | |||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
@@ -66,6 +72,7 @@ | |||
"VLLM_RPC_TIMEOUT": 100000, | |||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | |||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | |||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
@@ -96,6 +105,7 @@ | |||
"VLLM_RPC_TIMEOUT": 100000, | |||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | |||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | |||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
@@ -129,6 +141,7 @@ | |||
"VLLM_RPC_TIMEOUT": 100000, | |||
"VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1, | |||
"VLLM_ENGINE_ITERATION_TIMEOUT_S": 120, | |||
"VLLM_CPU_SGL_KERNE": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL
instead of VLLM_CPU_SGL_KERNE
. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py
.
"VLLM_CPU_SGL_KERNEL": 1,
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
6e50c04
to
3b35747
Compare
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
3b35747
to
d33e81d
Compare
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
8dfc24c
to
2d3c3ed
Compare
@@ -33,7 +33,8 @@ check_gpus() { | |||
|
|||
check_cpus() { | |||
# check the number of CPUs and NUMA Node and GPU type. | |||
declare -g numa_count=$(python3 -c "from numa import info;numa_size = info.get_num_configured_nodes(); print(numa_size)") | |||
last_numa_index=$(cat /sys/devices/system/node/online | cut -d'-' -f2) | |||
declare -g numa_count=$((last_numa_index+1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps using numa_count=$(lscpu | grep "NUMA node(s):" | awk '{print $3}')
is more robust.
@@ -0,0 +1,209 @@ | |||
[ | |||
{ | |||
"test_name": "serving_llama8B_pp1_sharegpt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be serving_llama8B_pp6_sharegpt
}, | ||
"server_parameters": { | ||
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct", | ||
"pipeline_parallel_size": 6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meanwhile, I recommend to use TP/PP together rather than a large PP size. For this case -tp=2 -pp=3
may be a better setting.
0.10.0 has been released. Perhaps we should make this PR based on that :) |
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Make sure all needed run parameters in vLLM Benchmark Suite for 0.9.2.
those new run parameters help to improve 1.3X speedup on v0.9.2 release.
Test Plan
Manually Test.
Test Result
1.3X speedup
(Optional) Documentation Update