Update vLLM Benchmark Suite for Xeon based on 0.9.2 release #21486

louie-tsai · 2025-07-24T02:23:38Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Make sure all needed run parameters in vLLM Benchmark Suite for 0.9.2.
those new run parameters help to improve 1.3X speedup on v0.9.2 release.

Test Plan

Manually Test.

Test Result

1.3X speedup

(Optional) Documentation Update

gemini-code-assist

Code Review

This pull request updates the vLLM benchmark suite for Xeon CPUs by adding new benchmark configurations and parameters. My review identified a couple of critical issues that need to be addressed:

A recurring typo in an environment variable (VLLM_CPU_SGL_KERNE instead of VLLM_CPU_SGL_KERNEL) across all new and modified benchmark configuration files. This would prevent the intended performance optimizations from taking effect.
A logical inconsistency in one of the new test configurations (serving-tests-cpu-snc3.json), where the pipeline_parallel_size does not match the test name, which would lead to incorrect benchmark runs.

I've provided specific suggestions to correct these issues. Once these are fixed, the changes should correctly configure the benchmarks for the intended performance improvements.

gemini-code-assist · 2025-07-24T02:24:49Z

.buildkite/nightly-benchmarks/tests/serving-tests-cpu-snc2.json

+            "VLLM_RPC_TIMEOUT": 100000,
+	    "VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1,
+	    "VLLM_ENGINE_ITERATION_TIMEOUT_S": 120,
+	    "VLLM_CPU_SGL_KERNE": 1,


There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL instead of VLLM_CPU_SGL_KERNE. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py.

"VLLM_CPU_SGL_KERNEL": 1,

gemini-code-assist · 2025-07-24T02:24:49Z

.buildkite/nightly-benchmarks/tests/serving-tests-cpu-snc2.json

+            "VLLM_RPC_TIMEOUT": 100000,
+	    "VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1,
+	    "VLLM_ENGINE_ITERATION_TIMEOUT_S": 120,
+	    "VLLM_CPU_SGL_KERNE": 1,


There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL instead of VLLM_CPU_SGL_KERNE. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py.

"VLLM_CPU_SGL_KERNEL": 1,

gemini-code-assist · 2025-07-24T02:24:50Z

.buildkite/nightly-benchmarks/tests/serving-tests-cpu-snc2.json

+            "VLLM_RPC_TIMEOUT": 100000,
+	    "VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1,
+	    "VLLM_ENGINE_ITERATION_TIMEOUT_S": 120,
+	    "VLLM_CPU_SGL_KERNE": 1,


There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL instead of VLLM_CPU_SGL_KERNE. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py.

"VLLM_CPU_SGL_KERNEL": 1,

gemini-code-assist · 2025-07-24T02:24:50Z

.buildkite/nightly-benchmarks/tests/serving-tests-cpu-snc2.json

+            "VLLM_RPC_TIMEOUT": 100000,
+	    "VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1,
+	    "VLLM_ENGINE_ITERATION_TIMEOUT_S": 120,
+	    "VLLM_CPU_SGL_KERNE": 1,


There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL instead of VLLM_CPU_SGL_KERNE. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py.

"VLLM_CPU_SGL_KERNEL": 1,

gemini-code-assist · 2025-07-24T02:24:50Z

.buildkite/nightly-benchmarks/tests/serving-tests-cpu-snc2.json

+            "VLLM_RPC_TIMEOUT": 100000,
+	    "VLLM_ALLOW_LONG_MAX_MODEL_LEN": 1,
+	    "VLLM_ENGINE_ITERATION_TIMEOUT_S": 120,
+	    "VLLM_CPU_SGL_KERNE": 1,


There appears to be a typo in the environment variable name. It should be VLLM_CPU_SGL_KERNEL instead of VLLM_CPU_SGL_KERNE. This typo will cause the environment variable to be ignored, preventing the intended performance optimizations from being applied. The correct spelling is confirmed in vllm/envs.py.

"VLLM_CPU_SGL_KERNEL": 1,

gemini-code-assist · 2025-07-24T02:24:50Z