Skip to content

Commit c1c5d56

Browse files
MengqingCaoYikun
andauthored
[Doc] Update FAQ and add test guidance (#1360)
### What this PR does / why we need it? - Add test guidance - Add reduce layer guidance - update faq on determinitic calculation --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
1 parent 5f5800b commit c1c5d56

File tree

4 files changed

+225
-65
lines changed

4 files changed

+225
-65
lines changed

docs/source/developer_guide/contributing.md

Lines changed: 4 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
It's recommended to set up a local development environment to build and test
55
before you submit a PR.
66

7-
### Prepare environment and build
7+
### Setup development environment
88

99
Theoretically, the vllm-ascend build is only supported on Linux because
1010
`vllm-ascend` dependency `torch_npu` only supports Linux.
@@ -48,72 +48,11 @@ bash format.sh
4848
git commit -sm "your commit info"
4949
```
5050

51-
### Testing
51+
🎉 Congratulations! You have completed the development environment setup.
5252

53-
Although vllm-ascend CI provide integration test on [Ascend](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml), you can run it
54-
locally. The simplest way to run these integration tests locally is through a container:
55-
56-
```bash
57-
# Under Ascend NPU environment
58-
git clone https://github.com/vllm-project/vllm-ascend.git
59-
cd vllm-ascend
60-
61-
export IMAGE=vllm-ascend-dev-image
62-
export CONTAINER_NAME=vllm-ascend-dev
63-
export DEVICE=/dev/davinci1
64-
65-
# The first build will take about 10 mins (10MB/s) to download the base image and packages
66-
docker build -t $IMAGE -f ./Dockerfile .
67-
# You can also specify the mirror repo via setting VLLM_REPO to speedup
68-
# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm
69-
70-
docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \
71-
--device /dev/davinci_manager --device /dev/devmm_svm \
72-
--device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
73-
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
74-
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
75-
-ti $IMAGE bash
76-
77-
cd vllm-ascend
78-
pip install -r requirements-dev.txt
79-
80-
pytest tests/
81-
```
82-
83-
84-
### Run doctest
85-
86-
vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files.
87-
The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:
88-
89-
```{code-block} bash
90-
:substitutions:
91-
92-
# Update DEVICE according to your device (/dev/davinci[0-7])
93-
export DEVICE=/dev/davinci0
94-
# Update the vllm-ascend image
95-
export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
96-
docker run --rm \
97-
--name vllm-ascend \
98-
--device $DEVICE \
99-
--device /dev/davinci_manager \
100-
--device /dev/devmm_svm \
101-
--device /dev/hisi_hdc \
102-
-v /usr/local/dcmi:/usr/local/dcmi \
103-
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
104-
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
105-
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
106-
-v /etc/ascend_install.info:/etc/ascend_install.info \
107-
-v /root/.cache:/root/.cache \
108-
-p 8000:8000 \
109-
-it $IMAGE bash
110-
111-
# Run doctest
112-
/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
113-
```
114-
115-
This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml).
53+
### Test locally
11654

55+
You can refer to [Testing](./testing.md) doc to help you setup testing environment and running tests locally.
11756

11857
## DCO and Signed-off-by
11958

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Testing
2+
3+
This secition explains how to write e2e tests and unit tests to verify the implementation of your feature.
4+
5+
## Setup test environment
6+
7+
The fastest way to setup test environment is to use the main branch container image:
8+
9+
:::::{tab-set}
10+
:sync-group: e2e
11+
12+
::::{tab-item} Single card
13+
:selected:
14+
:sync: single
15+
16+
```{code-block} bash
17+
:substitutions:
18+
19+
# Update DEVICE according to your device (/dev/davinci[0-7])
20+
export DEVICE=/dev/davinci0
21+
# Update the vllm-ascend image
22+
export IMAGE=quay.io/ascend/vllm-ascend:main
23+
docker run --rm \
24+
--name vllm-ascend \
25+
--device $DEVICE \
26+
--device /dev/davinci_manager \
27+
--device /dev/devmm_svm \
28+
--device /dev/hisi_hdc \
29+
-v /usr/local/dcmi:/usr/local/dcmi \
30+
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
31+
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
32+
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
33+
-v /etc/ascend_install.info:/etc/ascend_install.info \
34+
-v /root/.cache:/root/.cache \
35+
-p 8000:8000 \
36+
-it $IMAGE bash
37+
```
38+
39+
::::
40+
41+
::::{tab-item} Multi cards
42+
:sync: multi
43+
44+
```{code-block} bash
45+
:substitutions:
46+
# Update the vllm-ascend image
47+
export IMAGE=quay.io/ascend/vllm-ascend:main
48+
docker run --rm \
49+
--name vllm-ascend \
50+
--device /dev/davinci0 \
51+
--device /dev/davinci1 \
52+
--device /dev/davinci2 \
53+
--device /dev/davinci3 \
54+
--device /dev/davinci_manager \
55+
--device /dev/devmm_svm \
56+
--device /dev/hisi_hdc \
57+
-v /usr/local/dcmi:/usr/local/dcmi \
58+
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
59+
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
60+
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
61+
-v /etc/ascend_install.info:/etc/ascend_install.info \
62+
-v /root/.cache:/root/.cache \
63+
-p 8000:8000 \
64+
-it $IMAGE bash
65+
```
66+
::::
67+
68+
:::::
69+
70+
After starting the container, you should install the required packages:
71+
72+
```bash
73+
# Prepare
74+
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
75+
76+
# Install required packages
77+
pip install -r requirements-dev.txt
78+
```
79+
80+
## Running tests
81+
82+
### Unit test
83+
84+
There are several principles to follow when writing unit tests:
85+
86+
- The test file path should be consistent with source file and start with `test_` prefix, such as: `vllm_ascend/worker/worker_v1.py` --> `tests/ut/worker/test_worker_v1.py`
87+
- The vLLM Ascend test are using unittest framework, see [here](https://docs.python.org/3/library/unittest.html#module-unittest) to understand how to write unit tests.
88+
- All unit tests can be run on CPU, so you must mock the device-related function to host.
89+
- Example: [tests/ut/test_ascend_config.py](https://github.com/vllm-project/vllm-ascend/blob/main/tests/ut/test_ascend_config.py).
90+
- You can run the unit tests using `pytest`:
91+
92+
```bash
93+
cd /vllm-workspace/vllm-ascend/
94+
# Run all single card the tests
95+
pytest -sv tests/ut
96+
97+
# Run
98+
pytest -sv tests/ut/test_ascend_config.py
99+
```
100+
101+
### E2E test
102+
103+
Although vllm-ascend CI provide [e2e test](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml) on Ascend CI, you can run it
104+
locally.
105+
106+
:::::{tab-set}
107+
:sync-group: e2e
108+
109+
::::{tab-item} Single card
110+
:selected:
111+
:sync: single
112+
113+
```bash
114+
cd /vllm-workspace/vllm-ascend/
115+
# Run all single card the tests
116+
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/
117+
118+
# Run a certain test script
119+
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py
120+
121+
# Run a certain case in test script
122+
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models
123+
```
124+
::::
125+
126+
::::{tab-item} Multi cards test
127+
:sync: multi
128+
```bash
129+
cd /vllm-workspace/vllm-ascend/
130+
# Run all single card the tests
131+
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/
132+
133+
# Run a certain test script
134+
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_dynamic_npugraph_batchsize.py
135+
136+
# Run a certain case in test script
137+
VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.py::test_models
138+
```
139+
::::
140+
141+
:::::
142+
143+
This will reproduce e2e test: [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml).
144+
145+
#### E2E test example:
146+
147+
- Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py)
148+
- Online test examples: [`tests/e2e/singlecard/test_prompt_embedding.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_prompt_embedding.py)
149+
- Correctness test example: [`tests/e2e/singlecard/test_aclgraph.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_aclgraph.py)
150+
- Reduced Layer model test example: [test_torchair_graph_mode.py - DeepSeek-V3-Pruning](https://github.com/vllm-project/vllm-ascend/blob/20767a043cccb3764214930d4695e53941de87ec/tests/e2e/multicard/test_torchair_graph_mode.py#L48)
151+
152+
The CI resource is limited, you might need to reduce layer number of the model, below is an example of how to generate a reduced layer model:
153+
1. Fork the original model repo in modelscope, we need all the files in the repo except for weights.
154+
2. Set `num_hidden_layers` to the expected number of layers, e.g., `{"num_hidden_layers": 2,}`
155+
3. Copy the following python script as `generate_random_weight.py`. Set the relevant parameters `MODEL_LOCAL_PATH`, `DIST_DTYPE` and `DIST_MODEL_PATH` as needed:
156+
157+
```python
158+
import torch
159+
from transformers import AutoTokenizer, AutoConfig
160+
from modeling_deepseek import DeepseekV3ForCausalLM
161+
from modelscope import snapshot_download
162+
163+
MODEL_LOCAL_PATH = "~/.cache/modelscope/models/vllm-ascend/DeepSeek-V3-Pruning"
164+
DIST_DTYPE = torch.bfloat16
165+
DIST_MODEL_PATH = "./random_deepseek_v3_with_2_hidden_layer"
166+
167+
config = AutoConfig.from_pretrained(MODEL_LOCAL_PATH, trust_remote_code=True)
168+
model = DeepseekV3ForCausalLM(config)
169+
model = model.to(DIST_DTYPE)
170+
model.save_pretrained(DIST_MODEL_PATH)
171+
```
172+
173+
### Run doctest
174+
175+
vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files.
176+
The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:
177+
178+
```bash
179+
# Run doctest
180+
/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
181+
```
182+
183+
This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml).

docs/source/faqs.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,3 +126,40 @@ And if you're using DeepSeek-V3 or DeepSeek-R1, please make sure after the tenso
126126

127127
### 17. Failed to reinstall vllm-ascend from source after uninstalling vllm-ascend?
128128
You may encounter the problem of C compilation failure when reinstalling vllm-ascend from source using pip. If the installation fails, it is recommended to use `python setup.py install` to install, or use `python setup.py clean` to clear the cache.
129+
130+
### 18. How to generate determinitic results when using vllm-ascend?
131+
There are several factors that affect output certainty:
132+
133+
1. Sampler Method: using **Greedy sample** by setting `temperature=0` in `SamplingParams`, e.g.:
134+
135+
```python
136+
from vllm import LLM, SamplingParams
137+
138+
prompts = [
139+
"Hello, my name is",
140+
"The president of the United States is",
141+
"The capital of France is",
142+
"The future of AI is",
143+
]
144+
145+
# Create a sampling params object.
146+
sampling_params = SamplingParams(temperature=0)
147+
# Create an LLM.
148+
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
149+
150+
# Generate texts from the prompts.
151+
outputs = llm.generate(prompts, sampling_params)
152+
for output in outputs:
153+
prompt = output.prompt
154+
generated_text = output.outputs[0].text
155+
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
156+
```
157+
158+
2. Set the following enveriments parameters:
159+
160+
```bash
161+
export LCCL_DETERMINISTIC = 1
162+
export HCCL_DETERMINISTIC = 1
163+
export ATB_MATMUL_SHUFFLE_K_ENABLE = 0
164+
export ATB_LLM_LCOC_ENABLE = 0
165+
```

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ user_guide/release_notes
5757
:caption: Developer Guide
5858
:maxdepth: 1
5959
developer_guide/contributing
60+
developer_guide/testing
6061
developer_guide/versioning_policy
6162
developer_guide/evaluation/index
6263
:::

0 commit comments

Comments
 (0)