[RFC]: E2E CI test for key features 

### Motivation.

The key features of vllm is mostly not taken into CI currently, putting them at big risk of being destroyed. This RFC will list all the key features not contained in CI, and maintain the test scripts of them step by step.

### Proposed Change.
#### UTs

The UTs need to add are listed at https://github.com/vllm-project/vllm-ascend/issues/1298
#### Key features need E2E test
- [ ] doc tests 
  We could use `pytest-markdown-docs` to test the python scripts in docs, but it doesnot do setup and teardown as pytest, thus encouraging **oom** issue.
  - [ ] installation instruction test (including from source code, wheels on pypi and using docker image)
  - [x] single card example test (already done in e2e test, for both LLM and vLLM)
  - [ ] multi card example test (both `mp` and `ray`)
  - [x] quick_start https://github.com/vllm-project/vllm-ascend/pull/730
  - [ ] installation_pip_from_binary
  - [ ] installation_pip_from_source
  - [ ] installation_docker_from_image
  - [ ] installation_docker_from_source
  - [ ] tutorial_qwen3_8b
  - [ ] tutorial_qwen2.5_vl_7b
  - [ ] tutorial_qwq_32B
  - [ ] tutorial_qwq_32B_w8a8
  
- [x] spec decoding
  - [x] https://github.com/vllm-project/vllm-ascend/pull/425
  - [ ] add `tests/spec_decode/test_scorer.py` when the percision issue is fixed
  - [x] add series [e2e test for spec decode](https://github.com/vllm-project/vllm-ascend/pull/474): `tests/spec_decode/e2e/test_*.py`
  - [x] e2e correctness test for eagle and multi-step
    - [x] https://github.com/vllm-project/vllm-ascend/pull/560
    - [x] https://github.com/vllm-project/vllm-ascend/pull/474
  - [ ] sync the above to main
    - [x] https://github.com/vllm-project/vllm-ascend/pull/500
- [x] basic correctness https://github.com/vllm-project/vllm-ascend/pull/460
  - [x] mtp 
    - [x] https://github.com/vllm-project/vllm-ascend/pull/593
    - [x] https://github.com/vllm-project/vllm-ascend/pull/487

- [ ] multi-modality (VLMs)
  - [ ] e2e VLMs test https://github.com/vllm-project/vllm-ascend/pull/499
- [ ] pooling https://github.com/vllm-project/vllm-ascend/pull/594
- [x] guided decoding https://github.com/vllm-project/vllm-ascend/pull/422

- [ ] Parallel Mechanism
  - [ ] tp
    - [x] e2e tp inference https://github.com/vllm-project/vllm-ascend/pull/417
    - [ ] tp with ray backend
  - [ ] pp -- not support on v1 now
  - [ ] dp
    - [ ] e2e dp inference test on `Qwen/Qwen2.5-0.5B-Instruct` 
      - [x] https://github.com/vllm-project/vllm-ascend/pull/1235
      - [x] https://github.com/vllm-project/vllm-ascend/pull/1273
      - [ ] https://github.com/vllm-project/vllm-ascend/pull/1277
  - [ ] ep
    - [ ] https://github.com/vllm-project/vllm-ascend/pull/1384
  - [ ] etp
    - [ ] https://github.com/vllm-project/vllm-ascend/pull/1384
  

 - [ ] torchair graph mode
    - [x] v1 + torchair graph mode + ascend scheduler + tp4 e2e correctness ut on `vllm-ascend/DeepSeek-V3-Pruning` https://github.com/vllm-project/vllm-ascend/pull/1103
    - [ ] v1 + torchair graph mode + ascend scheduler + tp4 + mc2 e2e correctness ut on `vllm-ascend/DeepSeek-V3-Pruning`
    - [ ] v1 + torchair graph mode + ascend scheduler + tp4 + multi-stream e2e correctness ut on `vllm-ascend/DeepSeek-V3-Pruning`
    - [ ] v1 + torchair graph mode + ascend scheduler + tp2 + dp2 e2e correctness ut on `vllm-ascend/DeepSeek-V3-Pruning`
    - [ ] v1 + torchair graph mode + ascend scheduler + tp2 + dp2 + ep e2e correctness ut on `vllm-ascend/DeepSeek-V3-Pruning`
    - [ ] v1 + torchair graph mode + ascend scheduler + tp2 + dp2 + ep + etp e2e correctness ut on `vllm-ascend/DeepSeek-V3-Pruning`
- [ ] aclgraph
  - [x] v1 + aclgraph e2e correctness ut on `Qwen/Qwen2.5-0.5B-Instruct` https://github.com/vllm-project/vllm-ascend/pull/836
  - [ ] v1 + aclgraph + tp2 + dp2 e2e correctness ut on `Qwen/Qwen2.5-0.5B-Instruct`
  - [x] v1 + aclgraph e2e correctness ut on `Qwen/Qwen3-235B-A22B` https://github.com/vllm-project/vllm-ascend/pull/836
  - [ ] v1 + aclgraph + tp2 + dp2 e2e correctness ut on `Qwen/Qwen3-235B-A22B`
- [ ] disaggreate prefill
  - [x] AscendSimpleConnector + online serve + disaggreate prefill + `deepseek-ai/DeepSeek-V2-Lite` on single node https://github.com/vllm-project/vllm-ascend/pull/889
  - [x] LLMDataDistCMgrConnector + online serve + disaggreate prefill + `Qwen/Qwen2.5-0.5B-Instruct` https://github.com/vllm-project/vllm-ascend/pull/1296 **todo**: backport to main
  - [ ] LLMDataDistConnector + online serve + disaggreate prefill + `vllm-ascend/DeepSeek-V3-Pruning` on single node
  - [ ] LLMDataDistConnector + online serve + tp + disaggreate prefill + `vllm-ascend/DeepSeek-V3-Pruning` on single node
  - [ ] LLMDataDistConnector + online serve + tp + dp + disaggreate prefill + `vllm-ascend/DeepSeek-V3-Pruning` on single node
  - [ ] LLMDataDistConnector + online serve + tp + dp + ep + disaggreate prefill + `vllm-ascend/DeepSeek-V3-Pruning`on single node
  - [ ] LLMDataDistConnector + online serve + tp + dp + ep + torchair graph mode + disaggreate prefill + `vllm-ascend/DeepSeek-V3-Pruning` on single node
- [ ] eplb 
  - [x] ut for `ExpertLoadBalancer` #1186
  - [ ] e2e test for eplb @songshanhu07 https://github.com/vllm-project/vllm-ascend/pull/1223
- [ ] AscendScheduler
  - [x] `V1 + Ascend scheduler` https://github.com/vllm-project/vllm-ascend/pull/943
  - [ ] `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable prefix cache` vs `V1 + Ascend scheduler + enable prefix cache + enable chunked prefill` https://github.com/vllm-project/vllm-ascend/pull/1505
  - [ ] `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable chunked prefill` https://github.com/vllm-project/vllm-ascend/pull/1505
#### Accuracy Test
- [ ] accuracy ci for DP and EP and TP, including model (`Qwen/Qwen2.5-0.5B-Instruct`, `Qwen/Qwen2.5-VL-3B-Instruct`, `Qwen/Qwen3-30B-A3B`, `deepseek-ai/DeepSeek-V2-Lite`) https://github.com/vllm-project/vllm-ascend/pull/1140
- [x] accuracy ci for deepseek-v2-lite https://github.com/vllm-project/vllm-ascend/pull/905

#### Popular Models CI
- [x] [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-InstructQwen) e2e test_offline_inference.py
- [x] [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) https://github.com/vllm-project/vllm-ascend/pull/417
- [x] [deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite) https://github.com/vllm-project/vllm-ascend/pull/631
- [x] [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) https://github.com/vllm-project/vllm-ascend/pull/643
- [ ] [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- [x] [Qwen/QwQ-32B](https://modelscope.cn/models/Qwen/QwQ-32B) https://github.com/vllm-project/vllm-ascend/pull/417

#### Entrypoints e2e CI
- [x] offline inference
- [ ] online server
  - [ ] vllm serve
  - [ ] python3 -m vllm.entrypoints.openai.api_server 


### Feedback Period.

_No response_

### CC List.

_No response_

### Any Other Things.

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: E2E CI test for key features #413

Motivation.

Proposed Change.

UTs

Key features need E2E test

Accuracy Test

Popular Models CI

Entrypoints e2e CI

Feedback Period.

CC List.

Any Other Things.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: E2E CI test for key features #413

Description

Motivation.

Proposed Change.

UTs

Key features need E2E test

Accuracy Test

Popular Models CI

Entrypoints e2e CI

Feedback Period.

CC List.

Any Other Things.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions