Skip to content

Commit f15be0e

Browse files
whx-sjtuwangxiaoxin (A)
authored andcommitted
[Scheduler][MTP] Add support for speculative decoding in AsecendScheduler. (#943)
This PR adds support for speculative decoding in AsecendScheduler. Also inculde part of support for disaggregated prefill, full support will be merged in follow-up PR. --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
1 parent 0dfe5f5 commit f15be0e

File tree

5 files changed

+1002
-50
lines changed

5 files changed

+1002
-50
lines changed

.github/workflows/vllm_ascend_test.yaml

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -178,18 +178,20 @@ jobs:
178178
run: |
179179
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
180180
VLLM_USE_MODELSCOPE=True pytest -sv tests/singlecard/test_offline_inference.py
181-
pytest -sv tests/singlecard/test_scheduler.py
182181
# guided decoding doesn't work, fix it later
183182
# pytest -sv tests/singlecard/test_guided_decoding.py.py
184183
# test_ascend_config.py should be ran separately because it will regenerate the global config many times.
185184
pytest -sv tests/singlecard/test_ascend_config.py
186185
pytest -sv tests/singlecard/test_camem.py
186+
# pytest -sv tests/singlecard/core/test_ascend_scheduler.py
187+
# pytest -sv tests/singlecard/core/test_ascend_scheduler_e2e.py
187188
pytest -sv tests/singlecard/ \
188189
--ignore=tests/singlecard/test_offline_inference.py \
189-
--ignore=tests/singlecard/test_scheduler.py \
190190
--ignore=tests/singlecard/test_guided_decoding.py \
191191
--ignore=tests/singlecard/test_ascend_config.py \
192-
--ignore=tests/singlecard/test_camem.py
192+
--ignore=tests/singlecard/test_camem.py \
193+
--ignore=tests/singlecard/core/test_ascend_scheduler.py \
194+
--ignore=tests/singlecard/core/test_ascend_scheduler_e2e.py
193195
else
194196
pytest -sv tests/multicard/test_ilama_lora_tp2.py
195197
# To avoid oom, we need to run the test in a single process.
@@ -207,20 +209,21 @@ jobs:
207209
run: |
208210
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
209211
VLLM_USE_MODELSCOPE=True pytest -sv tests/singlecard/test_offline_inference.py
210-
pytest -sv tests/singlecard/test_scheduler.py
211212
# guided decoding doesn't work, fix it later
212213
# pytest -sv tests/singlecard/test_guided_decoding.py.py
213214
pytest -sv tests/singlecard/test_camem.py
214215
# test_ascend_config.py should be ran separately because it will regenerate the global config many times.
215216
pytest -sv tests/singlecard/test_ascend_config.py
216217
pytest -sv tests/singlecard/test_prompt_embedding.py
218+
pytest -sv tests/singlecard/core/test_ascend_scheduler.py
217219
pytest -sv tests/singlecard/ \
218220
--ignore=tests/singlecard/test_offline_inference.py \
219-
--ignore=tests/singlecard/test_scheduler.py \
220221
--ignore=tests/singlecard/test_guided_decoding.py \
221222
--ignore=tests/singlecard/test_camem.py \
222223
--ignore=tests/singlecard/test_ascend_config.py \
223-
--ignore=tests/singlecard/test_prompt_embedding.py
224+
--ignore=tests/singlecard/test_prompt_embedding.py \
225+
--ignore=tests/singlecard/core/test_ascend_scheduler.py \
226+
--ignore=tests/singlecard/core/test_ascend_scheduler_e2e.py
224227
else
225228
pytest -sv tests/multicard/test_ilama_lora_tp2.py
226229
# Fixme: run VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py will raise error.

tests/singlecard/core/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)