Skip to content

Commit e1ab6d3

Browse files
authored
[Misc] Refactor additional_config (#1029)
More and more config options are added to additional_config. This PR provide a new AscendConfig to manage these config options by an easier way to make code cleaner and readable. This PR also added the `additional_config` doc for users. Added the test_ascend_config.py to make sure the new AscendConfig works as expect. TODO: Add e2e test with torchair and deepseek once the CI resource is available. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
1 parent 7737aaa commit e1ab6d3

23 files changed

+456
-208
lines changed

.github/workflows/vllm_ascend_test.yaml

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,13 @@ jobs:
112112
pytest -sv tests/singlecard/test_scheduler.py
113113
# guided decoding doesn't work, fix it later
114114
# pytest -sv tests/singlecard/test_guided_decoding.py.py
115-
pytest -sv tests/singlecard/ --ignore=tests/singlecard/test_offline_inference.py --ignore=tests/singlecard/test_scheduler.py --ignore=tests/singlecard/test_guided_decoding.py
115+
# test_ascend_config.py should be ran separately because it will regenerate the global config many times.
116+
pytest -sv tests/singlecard/test_ascend_config.py
117+
pytest -sv tests/singlecard/ \
118+
--ignore=tests/singlecard/test_offline_inference.py \
119+
--ignore=tests/singlecard/test_scheduler.py \
120+
--ignore=tests/singlecard/test_guided_decoding.py \
121+
--ignore=tests/singlecard/test_ascend_config.py
116122
else
117123
pytest -sv tests/multicard/test_ilama_lora_tp2.py
118124
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py
@@ -128,11 +134,14 @@ jobs:
128134
# guided decoding doesn't work, fix it later
129135
# pytest -sv tests/singlecard/test_guided_decoding.py.py
130136
pytest -sv tests/singlecard/test_camem.py
137+
# test_ascend_config.py should be ran separately because it will regenerate the global config many times.
138+
pytest -sv tests/singlecard/test_ascend_config.py
131139
pytest -sv tests/singlecard/ \
132140
--ignore=tests/singlecard/test_offline_inference.py \
133141
--ignore=tests/singlecard/test_scheduler.py \
134142
--ignore=tests/singlecard/test_guided_decoding.py \
135-
--ignore=tests/singlecard/test_camem.py
143+
--ignore=tests/singlecard/test_camem.py \
144+
--ignore=tests/singlecard/test_ascend_config.py
136145
else
137146
pytest -sv tests/multicard/test_ilama_lora_tp2.py
138147
# Fixme: run VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py will raise error.

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ faqs
4646
user_guide/suppoted_features
4747
user_guide/supported_models
4848
user_guide/env_vars
49+
user_guide/additional_config
4950
user_guide/release_notes
5051
:::
5152

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Additional Configuration
2+
3+
addintional configuration is a mechanism provided by vLLM to allow plugins to control inner behavior by their own. vLLM Ascend uses this mechanism to make the project more flexible.
4+
5+
## How to use
6+
7+
With either online mode or offline mode, users can use additional configuration. Take Qwen3 as an example:
8+
9+
**Online mode**:
10+
11+
```bash
12+
vllm serve Qwen/Qwen3-8B --additional-config='{"config_key":"config_value"}'
13+
```
14+
15+
**Offline mode**:
16+
17+
```python
18+
from vllm import LLM
19+
20+
LLM(model="Qwen/Qwen3-8B", additional_config={"config_key":"config_value"})
21+
```
22+
23+
### Configuration options
24+
25+
The following table lists the additional configuration options available in vLLM Ascend:
26+
27+
| Name | Type | Default | Description |
28+
| ---- | ---- | ------- | ----------- |
29+
| `torchair_graph_config` | dict | `{}` | The config options for torchair graph mode |
30+
| `ascend_scheduler_config` | dict | `{}` | The config options for ascend scheduler |
31+
| `expert_tensor_parallel_size` | str | `1` | Expert tensor parallel size the model to use. |
32+
33+
The details of each config option are as follows:
34+
35+
**torchair_graph_config**
36+
37+
| Name | Type | Default | Description |
38+
| ---- | ---- | ------- | ----------- |
39+
| `enabled` | bool | `False` | Whether to enable torchair graph mode |
40+
| `use_cached_graph` | bool | `False` | Whether to use cached graph |
41+
| `graph_batch_sizes` | list[int] | `[]` | The batch size for torchair graph cache |
42+
| `graph_batch_sizes_init` | bool | `False` | Init graph batch size dynamically if `graph_batch_sizes` is empty |
43+
44+
**ascend_scheduler_config**
45+
46+
| Name | Type | Default | Description |
47+
| ---- | ---- | ------- | ----------- |
48+
| `enabled` | bool | `False` | Whether to enable ascend scheduler for V1 engine|
49+
50+
ascend_scheduler_config also support the options from [vllm scheduler config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig). For example, you can add `chunked_prefill_enabled: true` to ascend_scheduler_config as well.
51+
52+
### Example
53+
54+
A full example of additional configuration is as follows:
55+
56+
```
57+
{
58+
"torchair_graph_config": {
59+
"enabled": true,
60+
"use_cached_graph": true,
61+
"graph_batch_sizes": [1, 2, 4, 8],
62+
"graph_batch_sizes_init": true
63+
},
64+
"ascend_scheduler_config": {
65+
"enabled": true,
66+
"chunked_prefill_enabled": true,
67+
},
68+
"expert_tensor_parallel_size": 1
69+
}
70+
```

examples/dp_offline/data_parallel.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,9 @@ def main():
6262
max_num_seqs=num_seqs,
6363
additional_config={
6464
'expert_tensor_parallel_size': etp_size,
65-
'enable_graph_mode': False,
65+
'torchair_graph_config': {
66+
'enabled': False,
67+
},
6668
})
6769

6870
outputs = llm.generate(prompts, sampling_params)

tests/long_term/spec_decode/e2e/conftest.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -167,17 +167,17 @@ def run_equality_correctness_test(
167167

168168
# TODO current torchair graph mode needs clean torchair cache.
169169
# if do not clean, it will raise error
170-
additional_config = common_llm_kwargs.get("additional_config")
171-
enable_graph_mode = additional_config.get(
172-
"enable_graph_mode") if additional_config else False
170+
torchair_graph_enabled = common_llm_kwargs.get(
171+
"additional_config", {}).get("torchair_graph_config",
172+
{}).get("enabled", False)
173173

174174
with vllm_runner(**org_args) as vllm_model:
175-
if enable_graph_mode:
175+
if torchair_graph_enabled:
176176
_clean_torchair_cache()
177177
org_outputs = vllm_model.generate_w_logprobs(prompts, sampling_params)
178178

179179
with vllm_runner(**sd_args) as vllm_model:
180-
if enable_graph_mode:
180+
if torchair_graph_enabled:
181181
_clean_torchair_cache()
182182
if ensure_all_accepted or expected_acceptance_rate is not None:
183183
# Force log interval to be 0 to catch all metrics.

tests/long_term/spec_decode/e2e/test_mtp_correctness.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,9 @@ def test_mtp_e2e_greedy_logprobs(vllm_runner, common_llm_kwargs,
218218
"common_llm_kwargs",
219219
[{
220220
"additional_config": {
221-
'enable_graph_mode': True,
221+
'torchair_graph_config': {
222+
"enabled": True,
223+
},
222224
},
223225
224226
# Print spec metrics.
@@ -262,7 +264,9 @@ def test_mtp_e2e_greedy_correctness_torchair_graph(
262264
"common_llm_kwargs",
263265
[{
264266
"additional_config": {
265-
'enable_graph_mode': True,
267+
'torchair_graph_config': {
268+
"enabled": True,
269+
},
266270
},
267271
268272
# Print spec metrics.

tests/multicard/test_dynamic_npugraph_batchsize.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,6 @@
1818
import torch
1919
from vllm import LLM, SamplingParams
2020

21-
from vllm_ascend.utils import vllm_version_is
22-
2321
MODELS = [
2422
"Qwen/Qwen2.5-0.5B-Instruct",
2523
]
@@ -32,9 +30,6 @@
3230
]
3331

3432

35-
@pytest.mark.skipif(
36-
(vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1")),
37-
reason="aclgraph not supported in v0.8.5 and v0.8.5.post1")
3833
@pytest.mark.parametrize("model", MODELS)
3934
@pytest.mark.parametrize("tp_size", TENSOR_PARALLELS)
4035
@pytest.mark.parametrize("max_tokens", [64])

tests/multicard/test_offline_inference_distributed.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,7 @@
3131

3232
def test_models_distributed_QwQ():
3333
example_prompts = [
34-
"vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.",
35-
"Briefly describe the major milestones in the development of artificial intelligence from 1950 to 2020.",
36-
"Compare and contrast artificial intelligence with human intelligence in terms of processing information.",
34+
"Hello, my name is",
3735
]
3836
dtype = "half"
3937
max_tokens = 5
@@ -48,9 +46,7 @@ def test_models_distributed_QwQ():
4846

4947
def test_models_distributed_DeepSeek():
5048
example_prompts = [
51-
"vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.",
52-
"Briefly describe the major milestones in the development of artificial intelligence from 1950 to 2020.",
53-
"Compare and contrast artificial intelligence with human intelligence in terms of processing information.",
49+
"Hello, my name is",
5450
]
5551
dtype = "half"
5652
max_tokens = 5

tests/compile/test_aclgraph.py renamed to tests/singlecard/test_aclgraph.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,16 +28,12 @@
2828

2929
from tests.conftest import VllmRunner
3030
from tests.model_utils import check_outputs_equal
31-
from vllm_ascend.utils import vllm_version_is
3231

3332
MODELS = ["Qwen/Qwen2.5-0.5B-Instruct"]
3433

3534

3635
@pytest.mark.skipif(os.getenv("VLLM_USE_V1") == "0",
3736
reason="aclgraph only support on v1")
38-
@pytest.mark.skipif(
39-
(vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1")),
40-
reason="aclgraph not supported in v0.8.5 and v0.8.5.post1")
4137
@pytest.mark.parametrize("model", MODELS)
4238
@pytest.mark.parametrize("max_tokens", [32])
4339
def test_models(
@@ -88,9 +84,6 @@ def test_models(
8884

8985
@pytest.mark.skipif(os.getenv("VLLM_USE_V1") == "0",
9086
reason="aclgraph only support on v1")
91-
@pytest.mark.skipif(
92-
(vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1")),
93-
reason="aclgraph not supported in v0.8.5 and v0.8.5.post1")
9487
def test_deepseek_raises_error(monkeypatch: pytest.MonkeyPatch) -> None:
9588
with monkeypatch.context() as m:
9689
m.setenv("VLLM_USE_MODELSCOPE", "True")
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
#
2+
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
3+
# This file is a part of the vllm-ascend project.
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
import pytest
17+
18+
from tests.conftest import VllmRunner
19+
from vllm_ascend.ascend_config import clear_ascend_config, get_ascend_config
20+
21+
22+
def _clean_up_ascend_config(func):
23+
24+
def wrapper(*args, **kwargs):
25+
clear_ascend_config()
26+
func(*args, **kwargs)
27+
clear_ascend_config()
28+
29+
return wrapper
30+
31+
32+
@_clean_up_ascend_config
33+
def test_run_without_ascend_config():
34+
with VllmRunner("facebook/opt-125m"):
35+
ascend_config = get_ascend_config()
36+
37+
assert not ascend_config.torchair_graph_config.enabled
38+
assert not ascend_config.torchair_graph_config.use_cached_graph
39+
assert ascend_config.torchair_graph_config.graph_batch_sizes == []
40+
assert not ascend_config.torchair_graph_config.graph_batch_sizes_init
41+
assert not ascend_config.ascend_scheduler_config.enabled
42+
assert ascend_config.expert_tensor_parallel_size == 1
43+
44+
45+
@_clean_up_ascend_config
46+
def test_run_with_ascend_config():
47+
input_additional_config = {
48+
"torchair_graph_config": {
49+
# torchair graph only works with deepseek. The e2e test should be added
50+
# in multicard test with deepseek models.
51+
"enabled": False,
52+
"use_cached_graph": True,
53+
"graph_batch_sizes": [1, 2, 4, 8],
54+
"graph_batch_sizes_init": False,
55+
},
56+
"ascend_scheduler_config": {
57+
"enabled": True,
58+
"enable_chunked_prefill": True,
59+
},
60+
"expert_tensor_parallel_size": 1
61+
}
62+
with VllmRunner("facebook/opt-125m",
63+
additional_config=input_additional_config):
64+
ascend_config = get_ascend_config()
65+
66+
assert not ascend_config.torchair_graph_config.enabled
67+
assert ascend_config.torchair_graph_config.use_cached_graph
68+
assert ascend_config.torchair_graph_config.graph_batch_sizes == [
69+
1, 2, 4, 8
70+
]
71+
assert not ascend_config.torchair_graph_config.graph_batch_sizes_init
72+
assert ascend_config.ascend_scheduler_config.enabled
73+
assert ascend_config.ascend_scheduler_config.enable_chunked_prefill
74+
assert ascend_config.expert_tensor_parallel_size == 1
75+
76+
77+
@_clean_up_ascend_config
78+
def test_ascend_config_init_error():
79+
# ascend_config should be initialized first
80+
with pytest.raises(RuntimeError):
81+
_ = get_ascend_config()
82+
83+
84+
@_clean_up_ascend_config
85+
def test_ascend_config_load_error():
86+
# graph_batch_sizes should be list.
87+
with pytest.raises(TypeError):
88+
input_additional_config_fake_1 = {
89+
"torchair_graph_config": {
90+
"graph_batch_sizes": "fake_size",
91+
},
92+
}
93+
with VllmRunner("facebook/opt-125m",
94+
additional_config=input_additional_config_fake_1):
95+
pass
96+
97+
# graph_batch_sizes_init should not be True when graph_batch_sizes is not empty.
98+
with pytest.raises(ValueError):
99+
input_additional_config_fake_2 = {
100+
"torchair_graph_config": {
101+
"graph_batch_sizes": [1, 2, 4, 8],
102+
"graph_batch_sizes_init": True,
103+
},
104+
}
105+
with VllmRunner("facebook/opt-125m",
106+
additional_config=input_additional_config_fake_2):
107+
pass
108+
109+
# torchair graph only works with deepseek.
110+
with pytest.raises(NotImplementedError):
111+
input_additional_config_fake_2 = {
112+
"torchair_graph_config": {
113+
"enabled": True,
114+
},
115+
}
116+
with VllmRunner("facebook/opt-125m",
117+
additional_config=input_additional_config_fake_2):
118+
pass

0 commit comments

Comments
 (0)