[Doc] Update FAQ and add test guidance (#1360)

MengqingCao · Yikun · web-flow · commit c1c5d56255b1 · 2025-06-25T09:59:23.000+08:00
### What this PR does / why we need it?
- Add test guidance
- Add reduce layer guidance
- update faq on determinitic calculation

---------

Signed-off-by: MengqingCao &lt;cmq0113@163.com&gt;
Signed-off-by: Yikun Jiang &lt;yikunkero@gmail.com&gt;
Co-authored-by: Yikun Jiang &lt;yikunkero@gmail.com&gt;
diff --git a/docs/source/developer_guide/contributing.md b/docs/source/developer_guide/contributing.md
@@ -4,7 +4,7 @@
 It's recommended to set up a local development environment to build and test
 before you submit a PR.
 
-### Prepare environment and build
+### Setup development environment
 
 Theoretically, the vllm-ascend build is only supported on Linux because
 `vllm-ascend` dependency `torch_npu` only supports Linux.
@@ -48,72 +48,11 @@ bash format.sh
 git commit -sm "your commit info"
 ```
 
-### Testing
+🎉 Congratulations! You have completed the development environment setup.
 
-Although vllm-ascend CI provide integration test on [Ascend](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml), you can run it
-locally. The simplest way to run these integration tests locally is through a container:
-
-```bash
-# Under Ascend NPU environment
-git clone https://github.com/vllm-project/vllm-ascend.git
-cd vllm-ascend
-
-export IMAGE=vllm-ascend-dev-image
-export CONTAINER_NAME=vllm-ascend-dev
-export DEVICE=/dev/davinci1
-
-# The first build will take about 10 mins (10MB/s) to download the base image and packages
-docker build -t $IMAGE -f ./Dockerfile .
-# You can also specify the mirror repo via setting VLLM_REPO to speedup
-# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm
-
-docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \
-           --device /dev/davinci_manager --device /dev/devmm_svm \
-           --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
-           -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-           -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-           -ti $IMAGE bash
-
-cd vllm-ascend
-pip install -r requirements-dev.txt
-
-pytest tests/
-```
-
-
-### Run doctest
-
-vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files.
-The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:
-
-```{code-block} bash
-   :substitutions:
-
-# Update DEVICE according to your device (/dev/davinci[0-7])
-export DEVICE=/dev/davinci0
-# Update the vllm-ascend image
-export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
-docker run --rm \
---name vllm-ascend \
---device $DEVICE \
---device /dev/davinci_manager \
---device /dev/devmm_svm \
---device /dev/hisi_hdc \
--v /usr/local/dcmi:/usr/local/dcmi \
--v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
--v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
--v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
--v /etc/ascend_install.info:/etc/ascend_install.info \
--v /root/.cache:/root/.cache \
--p 8000:8000 \
--it $IMAGE bash
-
-# Run doctest
-/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
-```
-
-This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml).
+### Test locally
 
+You can refer to [Testing](./testing.md) doc to help you setup testing environment and running tests locally.
 
 ## DCO and Signed-off-by
 
diff --git a/docs/source/developer_guide/testing.md b/docs/source/developer_guide/testing.md
@@ -0,0 +1,183 @@
+# Testing
+
+This secition explains how to write e2e tests and unit tests to verify the implementation of your feature.
+
+## Setup test environment
+
+The fastest way to setup test environment is to use the main branch container image:
+
+:::::{tab-set}
+:sync-group: e2e
+
+::::{tab-item} Single card
+:selected:
+:sync: single
+
+```{code-block} bash
+   :substitutions:
+
+# Update DEVICE according to your device (/dev/davinci[0-7])
+export DEVICE=/dev/davinci0
+# Update the vllm-ascend image
+export IMAGE=quay.io/ascend/vllm-ascend:main
+docker run --rm \
+    --name vllm-ascend \
+    --device $DEVICE \
+    --device /dev/davinci_manager \
+    --device /dev/devmm_svm \
+    --device /dev/hisi_hdc \
+    -v /usr/local/dcmi:/usr/local/dcmi \
+    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
+    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
+    -v /etc/ascend_install.info:/etc/ascend_install.info \
+    -v /root/.cache:/root/.cache \
+    -p 8000:8000 \
+    -it $IMAGE bash
+```
+
+::::
+
+::::{tab-item} Multi cards
+:sync: multi
+
+```{code-block} bash
+   :substitutions:
+# Update the vllm-ascend image
+export IMAGE=quay.io/ascend/vllm-ascend:main
+docker run --rm \
+    --name vllm-ascend \
+    --device /dev/davinci0 \
+    --device /dev/davinci1 \
+    --device /dev/davinci2 \
+    --device /dev/davinci3 \
+    --device /dev/davinci_manager \
+    --device /dev/devmm_svm \
+    --device /dev/hisi_hdc \
+    -v /usr/local/dcmi:/usr/local/dcmi \
+    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
+    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
+    -v /etc/ascend_install.info:/etc/ascend_install.info \
+    -v /root/.cache:/root/.cache \
+    -p 8000:8000 \
+    -it $IMAGE bash
+```
+::::
+
+:::::
+
+After starting the container, you should install the required packages:
+
+```bash
+# Prepare
+pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
+
+# Install required packages
+pip install -r requirements-dev.txt
+```
+
+## Running tests
+
+### Unit test
+
+There are several principles to follow when writing unit tests:
+
+- The test file path should be consistent with source file and start with `test_` prefix, such as: `vllm_ascend/worker/worker_v1.py` --> `tests/ut/worker/test_worker_v1.py`
+- The vLLM Ascend test are using unittest framework, see [here](https://docs.python.org/3/library/unittest.html#module-unittest) to understand how to write unit tests.
+- All unit tests can be run on CPU, so you must mock the device-related function to host.
+- Example: [tests/ut/test_ascend_config.py](https://github.com/vllm-project/vllm-ascend/blob/main/tests/ut/test_ascend_config.py).
+- You can run the unit tests using `pytest`:
+
+    ```bash
+    cd /vllm-workspace/vllm-ascend/
+    # Run all single card the tests
+    pytest -sv tests/ut
+
+    # Run 
+    pytest -sv tests/ut/test_ascend_config.py
+    ```
+
+### E2E test
+
+Although vllm-ascend CI provide [e2e test](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml) on Ascend CI, you can run it
+locally.
+
+:::::{tab-set}
+:sync-group: e2e
+
+::::{tab-item} Single card
+:selected:
+:sync: single
+
+```bash
+cd /vllm-workspace/vllm-ascend/
+# Run all single card the tests
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/
+
+# Run a certain test script
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py
+
+# Run a certain case in test script
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models
+```
+::::
+
+::::{tab-item} Multi cards test
+:sync: multi
+```bash
+cd /vllm-workspace/vllm-ascend/
+# Run all single card the tests
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/
+
+# Run a certain test script
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_dynamic_npugraph_batchsize.py
+
+# Run a certain case in test script
+VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/multicard/test_offline_inference.py::test_models
+```
+::::
+
+:::::
+
+This will reproduce e2e test: [vllm_ascend_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml).
+
+#### E2E test example:
+
+- Offline test example: [`tests/e2e/singlecard/test_offline_inference.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_offline_inference.py)
+- Online test examples: [`tests/e2e/singlecard/test_prompt_embedding.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_prompt_embedding.py)
+- Correctness test example: [`tests/e2e/singlecard/test_aclgraph.py`](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/singlecard/test_aclgraph.py)
+- Reduced Layer model test example: [test_torchair_graph_mode.py - DeepSeek-V3-Pruning](https://github.com/vllm-project/vllm-ascend/blob/20767a043cccb3764214930d4695e53941de87ec/tests/e2e/multicard/test_torchair_graph_mode.py#L48)
+
+    The CI resource is limited, you might need to reduce layer number of the model, below is an example of how to generate a reduced layer model:
+    1. Fork the original model repo in modelscope, we need all the files in the repo except for weights.
+    2. Set `num_hidden_layers` to the expected number of layers, e.g., `{"num_hidden_layers": 2,}`
+    3. Copy the following python script as `generate_random_weight.py`. Set the relevant parameters `MODEL_LOCAL_PATH`, `DIST_DTYPE` and `DIST_MODEL_PATH` as needed:
+
+        ```python
+        import torch
+        from transformers import AutoTokenizer, AutoConfig
+        from modeling_deepseek import DeepseekV3ForCausalLM
+        from modelscope import snapshot_download
+
+        MODEL_LOCAL_PATH = "~/.cache/modelscope/models/vllm-ascend/DeepSeek-V3-Pruning"
+        DIST_DTYPE = torch.bfloat16
+        DIST_MODEL_PATH = "./random_deepseek_v3_with_2_hidden_layer"
+
+        config = AutoConfig.from_pretrained(MODEL_LOCAL_PATH, trust_remote_code=True)
+        model = DeepseekV3ForCausalLM(config)
+        model = model.to(DIST_DTYPE)
+        model.save_pretrained(DIST_MODEL_PATH)
+        ```
+
+### Run doctest
+
+vllm-ascend provides a `vllm-ascend/tests/e2e/run_doctests.sh` command to run all doctests in the doc files.
+The doctest is a good way to make sure the docs are up to date and the examples are executable, you can run it locally as follows:
+
+```bash
+# Run doctest
+/vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
+```
+
+This will reproduce the same environment as the CI: [vllm_ascend_doctest.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_doctest.yaml).
diff --git a/docs/source/faqs.md b/docs/source/faqs.md
@@ -126,3 +126,40 @@ And if you're using DeepSeek-V3 or DeepSeek-R1, please make sure after the tenso
 
 ### 17. Failed to reinstall vllm-ascend from source after uninstalling vllm-ascend?
 You may encounter the problem of C compilation failure when reinstalling vllm-ascend from source using pip. If the installation fails, it is recommended to use `python setup.py install` to install, or use `python setup.py clean` to clear the cache.
+
+### 18. How to generate determinitic results when using vllm-ascend?
+There are several factors that affect output certainty:
+
+1. Sampler Method: using **Greedy sample** by setting `temperature=0` in `SamplingParams`, e.g.:
+
+```python
+from vllm import LLM, SamplingParams
+
+prompts = [
+    "Hello, my name is",
+    "The president of the United States is",
+    "The capital of France is",
+    "The future of AI is",
+]
+
+# Create a sampling params object.
+sampling_params = SamplingParams(temperature=0)
+# Create an LLM.
+llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
+
+# Generate texts from the prompts.
+outputs = llm.generate(prompts, sampling_params)
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+```
+
+2. Set the following enveriments parameters:
+
+```bash
+export LCCL_DETERMINISTIC = 1
+export HCCL_DETERMINISTIC = 1
+export ATB_MATMUL_SHUFFLE_K_ENABLE = 0
+export ATB_LLM_LCOC_ENABLE = 0
+```
diff --git a/docs/source/index.md b/docs/source/index.md
@@ -57,6 +57,7 @@ user_guide/release_notes
 :caption: Developer Guide
 :maxdepth: 1
 developer_guide/contributing
+developer_guide/testing
 developer_guide/versioning_policy
 developer_guide/evaluation/index
 :::