Open
Description
Motivation.
The key features of vllm is mostly not taken into CI currently, putting them at big risk of being destroyed. This RFC will list all the key features not contained in CI, and maintain the test scripts of them step by step.
Proposed Change.
UTs
The UTs need to add are listed at #1298
Key features need E2E test
-
doc tests
We could usepytest-markdown-docs
to test the python scripts in docs, but it doesnot do setup and teardown as pytest, thus encouraging oom issue.- installation instruction test (including from source code, wheels on pypi and using docker image)
- single card example test (already done in e2e test, for both LLM and vLLM)
- multi card example test (both
mp
andray
) - quick_start Add e2e test frame work and doctest #730
- installation_pip_from_binary
- installation_pip_from_source
- installation_docker_from_image
- installation_docker_from_source
- tutorial_qwen3_8b
- tutorial_qwen2.5_vl_7b
- tutorial_qwq_32B
- tutorial_qwq_32B_w8a8
-
spec decoding
- [1/N][CI/UT] enable spec decode related UT #425
- add
tests/spec_decode/test_scorer.py
when the percision issue is fixed - add series e2e test for spec decode:
tests/spec_decode/e2e/test_*.py
- e2e correctness test for eagle and multi-step
- sync the above to main
-
basic correctness [CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) #460
-
multi-modality (VLMs)
-
guided decoding [CI]Add guided decoding test #422
-
Parallel Mechanism
- tp
- e2e tp inference [CI] Add new runner and enable QwQ multinpu test #417
- tp with ray backend
- pp -- not support on v1 now
- dp
- ep
- etp
- tp
-
torchair graph mode
- v1 + torchair graph mode + ascend scheduler + tp4 e2e correctness ut on
vllm-ascend/DeepSeek-V3-Pruning
[CI/UT][Graph] Add ut for torchair graph mode #1103 - v1 + torchair graph mode + ascend scheduler + tp4 + mc2 e2e correctness ut on
vllm-ascend/DeepSeek-V3-Pruning
- v1 + torchair graph mode + ascend scheduler + tp4 + multi-stream e2e correctness ut on
vllm-ascend/DeepSeek-V3-Pruning
- v1 + torchair graph mode + ascend scheduler + tp2 + dp2 e2e correctness ut on
vllm-ascend/DeepSeek-V3-Pruning
- v1 + torchair graph mode + ascend scheduler + tp2 + dp2 + ep e2e correctness ut on
vllm-ascend/DeepSeek-V3-Pruning
- v1 + torchair graph mode + ascend scheduler + tp2 + dp2 + ep + etp e2e correctness ut on
vllm-ascend/DeepSeek-V3-Pruning
- v1 + torchair graph mode + ascend scheduler + tp4 e2e correctness ut on
-
aclgraph
- v1 + aclgraph e2e correctness ut on
Qwen/Qwen2.5-0.5B-Instruct
[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph #836 - v1 + aclgraph + tp2 + dp2 e2e correctness ut on
Qwen/Qwen2.5-0.5B-Instruct
- v1 + aclgraph e2e correctness ut on
Qwen/Qwen3-235B-A22B
[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph #836 - v1 + aclgraph + tp2 + dp2 e2e correctness ut on
Qwen/Qwen3-235B-A22B
- v1 + aclgraph e2e correctness ut on
-
disaggreate prefill
- AscendSimpleConnector + online serve + disaggreate prefill +
deepseek-ai/DeepSeek-V2-Lite
on single node [CI/UT][PD Disaggreate] Initialize PD Disaggreate UT #889 - LLMDataDistCMgrConnector + online serve + disaggreate prefill +
Qwen/Qwen2.5-0.5B-Instruct
Disaggregate prefill for kv cache register style (merge into v0.9.1-dev) #1296 todo: backport to main - LLMDataDistConnector + online serve + disaggreate prefill +
vllm-ascend/DeepSeek-V3-Pruning
on single node - LLMDataDistConnector + online serve + tp + disaggreate prefill +
vllm-ascend/DeepSeek-V3-Pruning
on single node - LLMDataDistConnector + online serve + tp + dp + disaggreate prefill +
vllm-ascend/DeepSeek-V3-Pruning
on single node - LLMDataDistConnector + online serve + tp + dp + ep + disaggreate prefill +
vllm-ascend/DeepSeek-V3-Pruning
on single node - LLMDataDistConnector + online serve + tp + dp + ep + torchair graph mode + disaggreate prefill +
vllm-ascend/DeepSeek-V3-Pruning
on single node
- AscendSimpleConnector + online serve + disaggreate prefill +
-
eplb
- ut for
ExpertLoadBalancer
static EPLB fix bug, add unit test #1186 - e2e test for eplb @songshanhu07 [EPLB]: Correct local expert number calculation with redundant experts && add e2e test #1223
- ut for
-
AscendScheduler
-
V1 + Ascend scheduler
[Scheduler][MTP] Add support for speculative decoding in AsecendScheduler. #943 -
V1 + Ascend scheduler
vsV1 + Ascend scheduler + enable prefix cache
vsV1 + Ascend scheduler + enable prefix cache + enable chunked prefill
[CI/UT] Add test for chunk prefill and prefix cache on v1/AscendScheduler #1505 -
V1 + Ascend scheduler
vsV1 + Ascend scheduler + enable chunked prefill
[CI/UT] Add test for chunk prefill and prefix cache on v1/AscendScheduler #1505
-
Accuracy Test
- accuracy ci for DP and EP and TP, including model (
Qwen/Qwen2.5-0.5B-Instruct
,Qwen/Qwen2.5-VL-3B-Instruct
,Qwen/Qwen3-30B-A3B
,deepseek-ai/DeepSeek-V2-Lite
) [CI] Add accuracy ci for DP and EP and TP and ETP #1140 - accuracy ci for deepseek-v2-lite [Bugfix] Fix deepseek percision issue and add acc ci for it #905
Popular Models CI
- Qwen/Qwen2.5-0.5B-Instruct e2e test_offline_inference.py
- QwQ-32B [CI] Add new runner and enable QwQ multinpu test #417
- deepseek-ai/DeepSeek-V2-Lite [CI] Add deepseek-v2-lite test #631
- Qwen/Qwen2.5-VL-3B-Instruct [CI] Add qwen2.5-vl test #643
- meta-llama/Llama-3.2-1B-Instruct
- Qwen/QwQ-32B [CI] Add new runner and enable QwQ multinpu test #417
Entrypoints e2e CI
- offline inference
- online server
- vllm serve
- python3 -m vllm.entrypoints.openai.api_server
Feedback Period.
No response
CC List.
No response
Any Other Things.
No response