Skip to content

Commit 53ec583

Browse files
leo-ponyYikun
andauthored
[Docs] Update Altlas 300I series doc and fix CI lint (#1537)
### What this PR does / why we need it? - Update Altlas 300I series doc: cleanup unused parameters and enable optimized ops - Fix code spell CI ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
1 parent a054f0f commit 53ec583

File tree

3 files changed

+20
-42
lines changed

3 files changed

+20
-42
lines changed

.github/workflows/doc_codespell.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,6 @@ jobs:
2828
- name: Run codespell check
2929
run: |
3030
CODESPELL_EXCLUDES=('--skip' 'tests/prompts/**,./benchmarks/sonnet.txt,*tests/lora/data/**,build/**,./vllm_ascend.egg-info/**')
31-
CODESPELL_IGNORE_WORDS=('-L' 'CANN,cann,NNAL,nnal,ASCEND,ascend,EnQue,CopyIn')
31+
CODESPELL_IGNORE_WORDS=('-L' 'CANN,cann,NNAL,nnal,ASCEND,ascend,EnQue,CopyIn,assertIn')
3232
3333
codespell --toml pyproject.toml "${CODESPELL_EXCLUDES[@]}" "${CODESPELL_IGNORE_WORDS[@]}"

docs/source/tutorials/single_node_300i.md

Lines changed: 18 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -61,31 +61,24 @@ Run the following command to start the vLLM server:
6161
```{code-block} bash
6262
:substitutions:
6363
export VLLM_USE_V1=1
64-
export MODEL="Qwen/Qwen3-0.6B"
65-
python -m vllm.entrypoints.api_server \
66-
--model $MODEL \
64+
vllm serve Qwen/Qwen3-0.6B \
6765
--tensor-parallel-size 1 \
68-
--max-num-batched-tokens 2048 \
69-
--gpu-memory-utilization 0.5 \
70-
--max-num-seqs 4 \
7166
--enforce-eager \
72-
--trust-remote-code \
73-
--max-model-len 1024 \
74-
--disable-custom-all-reduce \
7567
--dtype float16 \
76-
--port 8000 \
77-
--compilation-config '{"custom_ops":["+rms_norm", "+rotary_embedding"]}'
68+
--compilation-config '{"custom_ops":["none", "+rms_norm", "+rotary_embedding"]}'
7869
```
7970

8071
Once your server is started, you can query the model with input prompts
8172

8273
```bash
83-
curl http://localhost:8000/generate \
74+
curl http://localhost:8000/v1/completions \
8475
-H "Content-Type: application/json" \
8576
-d '{
86-
"prompt": "Hello, my name is ?",
87-
"max_tokens": 20,
88-
"temperature": 0
77+
"prompt": "The future of AI is",
78+
"max_tokens": 64,
79+
"top_p": 0.95,
80+
"top_k": 50,
81+
"temperature": 0.6
8982
}'
9083
```
9184
::::
@@ -98,31 +91,24 @@ Run the following command to start the vLLM server:
9891
```{code-block} bash
9992
:substitutions:
10093
export VLLM_USE_V1=1
101-
export MODEL="Qwen/Qwen2.5-7B-Instruct"
102-
python -m vllm.entrypoints.api_server \
103-
--model $MODEL \
94+
vllm serve Qwen/Qwen2.5-7B-Instruct \
10495
--tensor-parallel-size 2 \
105-
--max-num-batched-tokens 2048 \
106-
--gpu-memory-utilization 0.5 \
107-
--max-num-seqs 4 \
10896
--enforce-eager \
109-
--trust-remote-code \
110-
--max-model-len 1024 \
111-
--disable-custom-all-reduce \
11297
--dtype float16 \
113-
--port 8000 \
114-
--compilation-config '{"custom_ops":["+rms_norm", "+rotary_embedding"]}'
98+
--compilation-config '{"custom_ops":["none", "+rms_norm", "+rotary_embedding"]}'
11599
```
116100

117101
Once your server is started, you can query the model with input prompts
118102

119103
```bash
120-
curl http://localhost:8000/generate \
104+
curl http://localhost:8000/v1/completions \
121105
-H "Content-Type: application/json" \
122106
-d '{
123-
"prompt": "Hello, my name is ?",
124-
"max_tokens": 20,
125-
"temperature": 0
107+
"prompt": "The future of AI is",
108+
"max_tokens": 64,
109+
"top_p": 0.95,
110+
"top_k": 50,
111+
"temperature": 0.6
126112
}'
127113
```
128114

@@ -206,14 +192,10 @@ sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
206192
# Create an LLM.
207193
llm = LLM(
208194
model="Qwen/Qwen3-0.6B",
209-
max_model_len=4096,
210-
max_num_seqs=4,
211-
trust_remote_code=True,
212195
tensor_parallel_size=1,
213196
enforce_eager=True, # For 300I series, only eager mode is supported.
214197
dtype="float16", # IMPORTANT cause some ATB ops cannot support bf16 on 300I series
215-
disable_custom_all_reduce=True, # IMPORTANT cause 300I series needed
216-
compilation_config={"custom_ops":["+rms_norm", "+rotary_embedding"]}, # IMPORTANT cause 300I series needed custom ops
198+
compilation_config={"custom_ops":["none", "+rms_norm", "+rotary_embedding"]}, # High performance for 300I series
217199
)
218200
# Generate texts from the prompts.
219201
outputs = llm.generate(prompts, sampling_params)
@@ -253,14 +235,10 @@ sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
253235
# Create an LLM.
254236
llm = LLM(
255237
model="Qwen/Qwen2.5-7B-Instruct",
256-
max_model_len=4096,
257-
max_num_seqs=4,
258-
trust_remote_code=True,
259238
tensor_parallel_size=2,
260239
enforce_eager=True, # For 300I series, only eager mode is supported.
261240
dtype="float16", # IMPORTANT cause some ATB ops cannot support bf16 on 300I series
262-
disable_custom_all_reduce=True, # IMPORTANT cause 300I series needed
263-
compilation_config={"custom_ops":["+rms_norm", "+rotary_embedding"]}, # IMPORTANT cause 300I series needed custom ops
241+
compilation_config={"custom_ops":["none", "+rms_norm", "+rotary_embedding"]}, # High performance for 300I series
264242
)
265243
# Generate texts from the prompts.
266244
outputs = llm.generate(prompts, sampling_params)

format.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ CODESPELL_EXCLUDES=(
145145
)
146146

147147
CODESPELL_IGNORE_WORDS=(
148-
'-L' 'CANN,cann,NNAL,nnal,ASCEND,ascend,EnQue,CopyIn'
148+
'-L' 'CANN,cann,NNAL,nnal,ASCEND,ascend,EnQue,CopyIn,assertIn'
149149
)
150150

151151
# check spelling of specified files

0 commit comments

Comments
 (0)