Skip to content

Commit 097e714

Browse files
Yikunfarawayboatangazennwangxiyuanleo-pony
authored
[Platform] Add initial experimental support for Altlas 300I series (vllm-project#1333)
### What this PR does / why we need it? Add initial experimental support for Ascend 310P, this patch squash below PR into one to help validation: - vllm-project#914 - vllm-project#1318 - vllm-project#1327 ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas 300I DUO series ### How was this patch tested? CI passed with: - E2E image build for 310P - CI test on A2 with e2e test and longterm test - Unit test missing because need a real 310P image to have the test, will add in a separate PR later. - Manually e2e test: - Qwen2.5-7b-instruct, Qwen2.5-0.5b, Qwen3-0.6B, Qwen3-4B, Qwen3-8B: vllm-project#914 (comment) - Pangu MGoE 72B The patch has been tested locally on Ascend 310P hardware to ensure that the changes do not break existing functionality and that the new features work as intended. #### ENV information CANN, NNAL version: 8.1.RC1 > [!IMPORTANT] > PTA 2.5.1 version >= torch_npu-2.5.1.post1.dev20250528 to support NZ format and calling NNAL operators on 310P #### Code example ##### Build vllm-ascend from source code ```shell # download source code as vllm-ascend cd vllm-ascend export SOC_VERSION=Ascend310P3 pip install -v -e . cd .. ``` ##### Run offline inference ```python from vllm import LLM, SamplingParams prompts = ["水的沸点是100摄氏度吗?请回答是或者否。", "若腋下体温为38摄氏度,请问这人是否发烧?请回答是或者否。", "水的沸点是100摄氏度吗?请回答是或者否。", "若腋下体温为38摄氏度,请问这人是否发烧?请回答是或者否。"] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=10) # Create an LLM. llm = LLM( model="Qwen/Qwen2.5-7B-Instruct", max_model_len=4096, max_num_seqs=4, dtype="float16", # IMPORTANT cause some ATB ops cannot support bf16 on 310P disable_custom_all_reduce=True, trust_remote_code=True, tensor_parallel_size=2, compilation_config={"custom_ops":['none', "+rms_norm", "+rotary_embedding"]}, ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` --------- Signed-off-by: Vincent Yuan <farawayboat@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: Vincent Yuan <farawayboat@gmail.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: shen-shanshan <467638484@qq.com>
1 parent 2009fdb commit 097e714

23 files changed

+839
-62
lines changed
File renamed without changes.
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
name: 'image / openEuler'
2+
# This is a docker build check and publish job:
3+
# 1. PR Triggered docker image build check
4+
# - is for image build check
5+
# - Enable on main/*-dev branch
6+
# - push: ${{ github.event_name != 'pull_request' }} ==> false
7+
# 2. branches push trigger image publish
8+
# - is for branch/dev/nightly image
9+
# - commits are merge into main/*-dev ==> vllm-ascend:main / vllm-ascend:*-dev
10+
# 3. tags push trigger image publish
11+
# - is for final release image
12+
# - Publish when tag with v* (pep440 version) ===> vllm-ascend:v1.2.3-openeuler|latest / vllm-ascend:v1.2.3rc1-openeuler
13+
on:
14+
pull_request:
15+
branches:
16+
- 'main'
17+
- '*-dev'
18+
paths:
19+
- '.github/workflows/image_310p_openeuler.yml'
20+
- 'Dockerfile.310p.openEuler'
21+
- 'vllm_ascend/**'
22+
- 'setup.py'
23+
- 'pyproject.toml'
24+
- 'requirements.txt'
25+
- 'cmake/**'
26+
- 'CMakeLists.txt'
27+
- 'csrc/**'
28+
push:
29+
# Publish image when tagging, the Dockerfile in tag will be build as tag image
30+
branches:
31+
- 'main'
32+
- '*-dev'
33+
tags:
34+
- 'v*'
35+
paths:
36+
- '.github/workflows/image_310p.openeuler.yml'
37+
- 'Dockerfile.310p.openEuler'
38+
- 'vllm_ascend/**'
39+
40+
jobs:
41+
build:
42+
name: vllm-ascend image build
43+
runs-on: >-
44+
${{
45+
github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
46+
'ubuntu-latest' ||
47+
'ubuntu-24.04-arm'
48+
}}
49+
steps:
50+
- uses: actions/checkout@v4
51+
52+
- name: Print
53+
run: |
54+
lscpu
55+
56+
- name: Docker meta
57+
id: meta
58+
uses: docker/metadata-action@v5
59+
with:
60+
# TODO(yikun): add more hub image and a note on release policy for container image
61+
images: |
62+
quay.io/ascend/vllm-ascend
63+
# Note for test case
64+
# https://github.com/marketplace/actions/docker-metadata-action#typeref
65+
# 1. branch job pulish per main/*-dev branch commits
66+
# 2. main and dev pull_request is build only, so the tag pr-N-openeuler is fine
67+
# 3. only pep440 matched tag will be published:
68+
# - v0.7.1 --> v0.7.1-openeuler, latest
69+
# - pre/post/dev: v0.7.1rc1-openeuler/v0.7.1rc1-openeuler/v0.7.1rc1.dev1-openeuler/v0.7.1.post1-openeuler, no latest
70+
# which follow the rule from vLLM with prefix v
71+
# TODO(yikun): the post release might be considered as latest release
72+
tags: |
73+
type=ref,event=branch,suffix=-310p-openeuler
74+
type=ref,event=pr,suffix=-openeuler
75+
type=pep440,pattern={{raw}},suffix=-310p-openeuler
76+
77+
- name: Free up disk space
78+
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
79+
with:
80+
tool-cache: true
81+
docker-images: false
82+
83+
- name: Build - Set up QEMU
84+
uses: docker/setup-qemu-action@v3
85+
86+
- name: Build - Set up Docker Buildx
87+
uses: docker/setup-buildx-action@v3
88+
89+
- name: Publish - Login to Quay Container Registry
90+
if: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
91+
uses: docker/login-action@v3
92+
with:
93+
registry: quay.io
94+
username: ${{ vars.QUAY_USERNAME }}
95+
password: ${{ secrets.QUAY_PASSWORD }}
96+
97+
- name: Build and push 310p
98+
uses: docker/build-push-action@v6
99+
with:
100+
platforms: >-
101+
${{
102+
github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
103+
'linux/amd64,linux/arm64' ||
104+
'linux/arm64'
105+
}}
106+
# use the current repo path as the build context, ensure .git is contained
107+
context: .
108+
# only trigger when tag, branch/main push
109+
push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
110+
labels: ${{ steps.meta.outputs.labels }}
111+
tags: ${{ steps.meta.outputs.tags }}
112+
file: Dockerfile.310p.openEuler
113+
build-args: |
114+
PIP_INDEX_URL=https://pypi.org/simple
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
name: 'image / Ubuntu'
2+
# This is a docker build check and publish job:
3+
# 1. PR Triggered docker image build check
4+
# - is for image build check
5+
# - Enable on main/*-dev branch
6+
# - push: ${{ github.event_name != 'pull_request' }} ==> false
7+
# 2. branches push trigger image publish
8+
# - is for branch/dev/nightly image
9+
# - commits are merge into main/*-dev ==> vllm-ascend:main / vllm-ascend:*-dev
10+
# 3. tags push trigger image publish
11+
# - is for final release image
12+
# - Publish when tag with v* (pep440 version) ===> vllm-ascend:v1.2.3|latest / vllm-ascend:v1.2.3rc1
13+
on:
14+
pull_request:
15+
branches:
16+
- 'main'
17+
- '*-dev'
18+
paths:
19+
- '.github/workflows/image_310p.ubuntu.yml'
20+
- 'Dockerfile.310p'
21+
- 'vllm_ascend/**'
22+
- 'setup.py'
23+
- 'pyproject.toml'
24+
- 'requirements.txt'
25+
- 'cmake/**'
26+
- 'CMakeLists.txt'
27+
- 'csrc/**'
28+
push:
29+
# Publish image when tagging, the Dockerfile in tag will be build as tag image
30+
branches:
31+
- 'main'
32+
- '*-dev'
33+
tags:
34+
- 'v*'
35+
paths:
36+
- '.github/workflows/image_310p_ubuntu.yml'
37+
- 'Dockerfile.310p'
38+
- 'vllm_ascend/**'
39+
jobs:
40+
41+
build:
42+
name: vllm-ascend image build
43+
runs-on: ubuntu-latest
44+
45+
steps:
46+
- uses: actions/checkout@v4
47+
48+
- name: Print
49+
run: |
50+
lscpu
51+
52+
- name: Docker meta
53+
id: meta
54+
uses: docker/metadata-action@v5
55+
with:
56+
# TODO(yikun): add more hub image and a note on release policy for container image
57+
images: |
58+
quay.io/ascend/vllm-ascend
59+
# Note for test case
60+
# https://github.com/marketplace/actions/docker-metadata-action#typeref
61+
# 1. branch job pulish per main/*-dev branch commits
62+
# 2. main and dev pull_request is build only, so the tag pr-N is fine
63+
# 3. only pep440 matched tag will be published:
64+
# - v0.7.1 --> v0.7.1, latest
65+
# - pre/post/dev: v0.7.1rc1/v0.7.1rc1/v0.7.1rc1.dev1/v0.7.1.post1, no latest
66+
# which follow the rule from vLLM with prefix v
67+
# TODO(yikun): the post release might be considered as latest release
68+
tags: |
69+
type=ref,event=branch,suffix=-310p
70+
type=ref,event=pr,suffix=-310p
71+
type=pep440,pattern={{raw}},suffix=-310p
72+
73+
- name: Free up disk space
74+
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
75+
with:
76+
tool-cache: true
77+
docker-images: false
78+
79+
- name: Build - Set up QEMU
80+
uses: docker/setup-qemu-action@v3
81+
82+
- name: Build - Set up Docker Buildx
83+
uses: docker/setup-buildx-action@v3
84+
85+
- name: Publish - Login to Quay Container Registry
86+
if: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
87+
uses: docker/login-action@v3
88+
with:
89+
registry: quay.io
90+
username: ${{ vars.QUAY_USERNAME }}
91+
password: ${{ secrets.QUAY_PASSWORD }}
92+
93+
- name: Build and push 310p
94+
uses: docker/build-push-action@v6
95+
with:
96+
platforms: >-
97+
${{
98+
github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
99+
'linux/amd64,linux/arm64' ||
100+
'linux/amd64'
101+
}}
102+
# use the current repo path as the build context, ensure .git is contained
103+
context: .
104+
file: Dockerfile.310p
105+
# only trigger when tag, branch/main push
106+
push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
107+
labels: ${{ steps.meta.outputs.labels }}
108+
tags: ${{ steps.meta.outputs.tags }}
109+
build-args: |
110+
PIP_INDEX_URL=https://pypi.org/simple

.github/workflows/image_openeuler.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ jobs:
9494
username: ${{ vars.QUAY_USERNAME }}
9595
password: ${{ secrets.QUAY_PASSWORD }}
9696

97-
- name: Build and push
97+
- name: Build and push 910b
9898
uses: docker/build-push-action@v6
9999
with:
100100
platforms: >-

.github/workflows/image_ubuntu.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ jobs:
9090
username: ${{ vars.QUAY_USERNAME }}
9191
password: ${{ secrets.QUAY_PASSWORD }}
9292

93-
- name: Build and push
93+
- name: Build and push 910b
9494
uses: docker/build-push-action@v6
9595
with:
9696
platforms: >-
@@ -101,6 +101,7 @@ jobs:
101101
}}
102102
# use the current repo path as the build context, ensure .git is contained
103103
context: .
104+
file: Dockerfile
104105
# only trigger when tag, branch/main push
105106
push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
106107
labels: ${{ steps.meta.outputs.labels }}

Dockerfile.310p

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#
2+
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# This file is a part of the vllm-ascend project.
16+
#
17+
18+
FROM quay.io/ascend/cann:8.1.rc1-310p-ubuntu22.04-py3.10
19+
20+
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
21+
ARG COMPILE_CUSTOM_KERNELS=1
22+
23+
# Define environments
24+
ENV DEBIAN_FRONTEND=noninteractive
25+
ENV COMPILE_CUSTOM_KERNELS=${COMPILE_CUSTOM_KERNELS}
26+
27+
RUN apt-get update -y && \
28+
apt-get install -y python3-pip git vim wget net-tools gcc g++ cmake libnuma-dev && \
29+
rm -rf /var/cache/apt/* && \
30+
rm -rf /var/lib/apt/lists/*
31+
32+
WORKDIR /workspace
33+
34+
COPY . /vllm-workspace/vllm-ascend/
35+
36+
RUN pip config set global.index-url ${PIP_INDEX_URL}
37+
38+
# Install vLLM
39+
ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
40+
ARG VLLM_TAG=v0.9.1
41+
RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /vllm-workspace/vllm
42+
# In x86, triton will be installed by vllm. But in Ascend, triton doesn't work correctly. we need to uninstall it.
43+
RUN VLLM_TARGET_DEVICE="empty" python3 -m pip install -v -e /vllm-workspace/vllm/ --extra-index https://download.pytorch.org/whl/cpu/ && \
44+
python3 -m pip uninstall -y triton && \
45+
python3 -m pip cache purge
46+
47+
# Install vllm-ascend
48+
# Append `libascend_hal.so` path (devlib) to LD_LIBRARY_PATH
49+
RUN export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi && \
50+
source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
51+
source /usr/local/Ascend/nnal/atb/set_env.sh && \
52+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
53+
export SOC_VERSION=ASCEND310P3 && \
54+
python3 -m pip install -v -e /vllm-workspace/vllm-ascend/ --extra-index https://download.pytorch.org/whl/cpu/ && \
55+
python3 -m pip cache purge
56+
57+
# Install modelscope (for fast download) and ray (for multinode)
58+
RUN python3 -m pip install modelscope ray && \
59+
python3 -m pip cache purge
60+
61+
CMD ["/bin/bash"]

Dockerfile.310p.openEuler

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
#
2+
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# This file is a part of the vllm-ascend project.
16+
#
17+
18+
FROM quay.io/ascend/cann:8.1.rc1-310p-openeuler22.03-py3.10
19+
20+
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
21+
ARG COMPILE_CUSTOM_KERNELS=1
22+
23+
ENV COMPILE_CUSTOM_KERNELS=${COMPILE_CUSTOM_KERNELS}
24+
25+
RUN yum update -y && \
26+
yum install -y python3-pip git vim wget net-tools gcc gcc-c++ make cmake numactl-devel && \
27+
rm -rf /var/cache/yum
28+
29+
RUN pip config set global.index-url ${PIP_INDEX_URL}
30+
31+
WORKDIR /workspace
32+
33+
COPY . /vllm-workspace/vllm-ascend/
34+
35+
# Install vLLM
36+
ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
37+
ARG VLLM_TAG=v0.9.1
38+
39+
RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /vllm-workspace/vllm
40+
# In x86, triton will be installed by vllm. But in Ascend, triton doesn't work correctly. we need to uninstall it.
41+
RUN VLLM_TARGET_DEVICE="empty" python3 -m pip install -e /vllm-workspace/vllm/ --extra-index https://download.pytorch.org/whl/cpu/ && \
42+
python3 -m pip uninstall -y triton && \
43+
python3 -m pip cache purge
44+
45+
# Install vllm-ascend
46+
RUN export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi && \
47+
source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
48+
source /usr/local/Ascend/nnal/atb/set_env.sh && \
49+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
50+
export SOC_VERSION=ASCEND310P3 && \
51+
python3 -m pip install -v -e /vllm-workspace/vllm-ascend/ --extra-index https://download.pytorch.org/whl/cpu/ && \
52+
python3 -m pip cache purge
53+
54+
# Install modelscope (for fast download) and ray (for multinode)
55+
RUN python3 -m pip install modelscope ray && \
56+
python3 -m pip cache purge
57+
58+
CMD ["/bin/bash"]

0 commit comments

Comments
 (0)