Skip to content

Commit 8d00775

Browse files
Yikunyiz-liumengwei805
authored
[SpecDecode][CI] Set default values to fix spec decode and fix multicard CI (#1109)
### What this PR does / why we need it? - Set default values to fix spec decode - To avoid oom, we need to run the test in a single process ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed, espcecially multicards CI - For spec decode test, long term CI passed Closes: #1105 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: mengwei805 <mengwei25@huawei.com>
1 parent e9ada68 commit 8d00775

File tree

2 files changed

+13
-1
lines changed

2 files changed

+13
-1
lines changed

.github/workflows/vllm_ascend_test.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,11 @@ jobs:
123123
--ignore=tests/singlecard/test_camem.py
124124
else
125125
pytest -sv tests/multicard/test_ilama_lora_tp2.py
126-
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py
126+
# To avoid oom, we need to run the test in a single process.
127+
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ
128+
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek
129+
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_topk
130+
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py --ignore=tests/multicard/test_offline_inference_distributed.py
127131
fi
128132
129133
- name: Run vllm-project/vllm-ascend test on V0 engine
@@ -149,7 +153,9 @@ jobs:
149153
else
150154
pytest -sv tests/multicard/test_ilama_lora_tp2.py
151155
# Fixme: run VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py will raise error.
156+
# To avoid oom, we need to run the test in a single process.
152157
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ
153158
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek
159+
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_topk
154160
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py --ignore=tests/multicard/test_offline_inference_distributed.py
155161
fi

vllm_ascend/patch/worker/patch_common/patch_spec_decode_worker.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,12 @@ def create_worker(
5656
draft_worker_kwargs.pop("ngram_prompt_lookup_max"))
5757
ngram_prompt_lookup_min = (
5858
draft_worker_kwargs.pop("ngram_prompt_lookup_min"))
59+
60+
# TODO(Yizhou): A quick fix, must be refactored ASAP
61+
draft_worker_kwargs["vllm_config"].parallel_config.expert_parallel_size = 1
62+
draft_worker_kwargs[
63+
"vllm_config"].parallel_config.expert_tensor_parallel_size = 1
64+
5965
draft_model_config = draft_worker_kwargs["vllm_config"].model_config
6066
draft_parallel_config: ParallelConfig = draft_worker_kwargs[
6167
'vllm_config'].parallel_config

0 commit comments

Comments
 (0)