Skip to content

Upstream 091 eplb dynamic #1663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 195 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
195 commits
Select commit Hold shift + click to select a range
2a94321
add eplb policy
Jun 10, 2025
e91956f
add eplb updator
Jun 10, 2025
66b3d2e
implementation of VllmEplbAdaptor and D2DExpertWeightLoader
wanghanqingLYT Jun 10, 2025
05a536c
add eplb policy and updator
raindaywhu Jun 10, 2025
24ca412
Merge pull request #39 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 10, 2025
86fe2c0
determine num_dense_layers and num_moe_layers by refering to model co…
wanghanqingLYT Jun 10, 2025
caeaf2c
Merge pull request #41 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 10, 2025
e68e522
EPLB add eplb_worker
qmkakaxi Jun 10, 2025
f450936
Merge pull request #42 from raindaywhu/dev_mereg_wjh
qmkakaxi Jun 10, 2025
d639144
add ssd loader
qmkakaxi Jun 10, 2025
f1f936b
EPLB moed load collect
qmkakaxi Jun 10, 2025
bd924f2
delete invalida import
qmkakaxi Jun 10, 2025
7e9bb54
Merge pull request #43 from raindaywhu/dev_mereg_wjh
qmkakaxi Jun 10, 2025
afcce8e
fix bugs in fused_experts_with_all2all
wanghanqingLYT Jun 11, 2025
bca9b34
Merge pull request #44 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 11, 2025
cc88ea7
add eplb tabel generator
Jun 11, 2025
f2d0a75
add eplb tabel generator
raindaywhu Jun 11, 2025
78079a7
Adapt static EPLB
qmkakaxi Jun 11, 2025
22f03db
Merge branch 'master' into br_wjh_eplb
qmkakaxi Jun 11, 2025
485e3d0
add enable_eplb in ascend_config
qmkakaxi Jun 11, 2025
9e5e117
enable_eplb -> dynamic_eplb
qmkakaxi Jun 11, 2025
c28f6cb
add eplb policy
Jun 10, 2025
839cab1
add eplb updator
Jun 10, 2025
6ad801d
implementation of VllmEplbAdaptor and D2DExpertWeightLoader
wanghanqingLYT Jun 10, 2025
34109ac
determine num_dense_layers and num_moe_layers by refering to model co…
wanghanqingLYT Jun 10, 2025
c15f8a8
EPLB add eplb_worker
qmkakaxi Jun 10, 2025
1400ea3
add ssd loader
qmkakaxi Jun 10, 2025
28af393
EPLB moed load collect
qmkakaxi Jun 10, 2025
aa619f4
delete invalida import
qmkakaxi Jun 10, 2025
6fc343c
fix bugs in fused_experts_with_all2all
wanghanqingLYT Jun 11, 2025
b97e066
Adapt static EPLB
qmkakaxi Jun 11, 2025
474a5c3
add eplb tabel generator
Jun 11, 2025
807348f
add enable_eplb in ascend_config
qmkakaxi Jun 11, 2025
85d29c5
enable_eplb -> dynamic_eplb
qmkakaxi Jun 11, 2025
e4172aa
fix bugs in dynamioc eplb
wanghanqingLYT Jun 14, 2025
264a3a5
delete print in funsed_moe forward
wanghanqingLYT Jun 14, 2025
7334158
Merge pull request #52 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
a5dd04f
fix bugs caused by variable name old_placemet
wanghanqingLYT Jun 14, 2025
60c87b0
Merge pull request #53 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
241b722
move get_init_expert_map to forward_before
wanghanqingLYT Jun 14, 2025
b888701
Merge pull request #54 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
441508c
fix bug in log2phy in dynamic w8a8
wanghanqingLYT Jun 14, 2025
627757e
Merge pull request #55 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
88bb99d
fix bug for dim of updated_log2phy_map
wanghanqingLYT Jun 14, 2025
3913395
Merge pull request #56 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
dd55ce2
Merge remote-tracking branch 'origin/br_whq_eplb_main' into br_wjh_eplb
qmkakaxi Jun 16, 2025
e86964e
add dynamic_ep alg.
qmkakaxi Jun 16, 2025
9d1893a
fxi bugs
qmkakaxi Jun 16, 2025
230fd9c
Merge pull request #57 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 16, 2025
0992bee
fix eplb update log
raindaywhu Jun 16, 2025
91ff797
Merge pull request #59 from raindaywhu/cy_eplb
raindaywhu Jun 16, 2025
0c6210f
fix bugsw
qmkakaxi Jun 16, 2025
3b7fd9b
Merge pull request #60 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 16, 2025
9d63949
improve the implement of communication between main process and eplb …
wanghanqingLYT Jun 16, 2025
0269ef6
Merge pull request #61 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 16, 2025
ae01e08
add compose_expert_update_info_bipartite
qmkakaxi Jun 17, 2025
4b5cd84
adapt compose_expert_update_info_bipartite into eplb process
wanghanqingLYT Jun 17, 2025
74fe5ff
Merge branch 'br_whq_eplb_main' into br_wjh_eplb
wanghanqingLYT Jun 17, 2025
b7bfcc9
Merge pull request #62 from raindaywhu/br_wjh_eplb
wanghanqingLYT Jun 17, 2025
d5dc946
fix bugsw
qmkakaxi Jun 16, 2025
86be76f
improve the implement of communication between main process and eplb …
wanghanqingLYT Jun 16, 2025
03abde3
move generate log2ph map to eplb_worker
raindaywhu Jun 17, 2025
7f77443
fix bugsw
qmkakaxi Jun 16, 2025
0b8d00a
improve the implement of communication between main process and eplb …
wanghanqingLYT Jun 16, 2025
447360f
avoid frequetly synchronize between device and cpu when accessing to …
wanghanqingLYT Jun 17, 2025
acf2aee
Merge pull request #63 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 17, 2025
40ee72f
add gate for cacluate moe load
qmkakaxi Jun 17, 2025
baacad8
Merge pull request #64 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 17, 2025
4380fdd
fix log2phy
raindaywhu Jun 17, 2025
556169d
Merge branch 'br_whq_eplb_main' into cy_eplb
raindaywhu Jun 17, 2025
347f60c
Merge branch 'br_whq_eplb_main' into cy_eplb
raindaywhu Jun 17, 2025
49efd9b
fix bugs in expert_map_per_layer_cpu
wanghanqingLYT Jun 17, 2025
352dbca
Merge pull request #66 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 17, 2025
d18aef1
fix log2phy
raindaywhu Jun 17, 2025
d6a76f8
Merge branch 'br_whq_eplb_main' into cy_eplb
raindaywhu Jun 17, 2025
53d0218
fix log2phy
Jun 17, 2025
af10b4a
mv log2phy into eplb worker
raindaywhu Jun 17, 2025
b39b6d2
Merge pull request #65 from raindaywhu/cy_eplb
raindaywhu Jun 17, 2025
1193c97
default 10 turns to wait worker finished
raindaywhu Jun 17, 2025
aa1660e
Merge pull request #67 from raindaywhu/cy_eplb
raindaywhu Jun 17, 2025
78b7480
fix bug in compose_expert_update_info_bipartite when adding node
wanghanqingLYT Jun 18, 2025
1d9b011
Merge pull request #68 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 18, 2025
8e6b1ee
improve running time in generate_expert_d2d_transfer_task
wanghanqingLYT Jun 18, 2025
6d845f2
Merge pull request #69 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 18, 2025
43def8a
add warm up & batch add
qmkakaxi Jun 18, 2025
130bbb9
Merge pull request #70 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 18, 2025
9219cc8
delete layer moe load
qmkakaxi Jun 18, 2025
c600494
Merge pull request #71 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 18, 2025
2125fe0
add get_tok_ids
qmkakaxi Jun 18, 2025
2403b59
Merge pull request #72 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 18, 2025
4bda9ba
Extract cal_moe_load from deepseek_v2
qmkakaxi Jun 18, 2025
2e824cd
Merge pull request #73 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 18, 2025
1b78fb2
running time reduction forward_before and forward_end
wanghanqingLYT Jun 19, 2025
53728f3
Merge pull request #74 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 19, 2025
e4b1ba0
packed update info and put/get
qmkakaxi Jun 19, 2025
1c8edad
Merge pull request #75 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 19, 2025
a9584dd
add get expert workload
Jun 19, 2025
6592d72
fix bug in pack update info
qmkakaxi Jun 19, 2025
e6a3851
Merge pull request #76 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 19, 2025
17fc31e
improve implementation of generate_log2phy_map
wanghanqingLYT Jun 19, 2025
082e82d
Merge pull request #77 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 19, 2025
926de75
Merge remote-tracking branch 'vllm_main/main' into br_main_into_eplb
qmkakaxi Jun 20, 2025
22de4ee
fix warm up & change init expert map from file
qmkakaxi Jun 20, 2025
e83f89d
add moe load in worker_v1
qmkakaxi Jun 20, 2025
3604f04
Merge remote-tracking branch 'vllm_main/main' into br_main_into_eplb
qmkakaxi Jun 20, 2025
2484055
Merge pull request #78 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 20, 2025
1a21f30
fix warm up bugs
qmkakaxi Jun 20, 2025
38c1234
Merge pull request #79 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 20, 2025
051f77a
fix log2phy bug
qmkakaxi Jun 20, 2025
7b6b474
Merge pull request #80 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 20, 2025
d0c98c9
fix bugs: batch_isend_irecv synchronization and dtype bug in log2phy
wanghanqingLYT Jun 20, 2025
6226dee
Merge pull request #81 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 20, 2025
9295a9c
add another check for new placement generated by eplb algorithm
wanghanqingLYT Jun 21, 2025
67fa706
add dynamic_ep_v2
qmkakaxi Jun 21, 2025
2ccda78
Merge pull request #83 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
7a11221
Merge pull request #82 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 21, 2025
771d4c7
fix dummy_run and profile_run
Jun 21, 2025
ff1076f
Merge pull request #84 from raindaywhu/cy_br_main_into_eplb
raindaywhu Jun 21, 2025
da27c2d
add mock experts_load data
Jun 21, 2025
70a922e
fix bugs in get_init_expert_map_from_file
wanghanqingLYT Jun 21, 2025
89f4376
Merge pull request #86 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 21, 2025
af31373
fix bug in init expert_map_per_layer_cpu
wanghanqingLYT Jun 21, 2025
5751c27
Merge pull request #87 from raindaywhu/dev_whq_eplb2
wanghanqingLYT Jun 21, 2025
613c030
add gate_eplb
qmkakaxi Jun 21, 2025
a2505fb
Merge pull request #88 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
057a297
get_init_experts_map in warm up
qmkakaxi Jun 21, 2025
62108d7
Merge pull request #89 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
9f498e9
add update_expert_load_statistical_period logic
Jun 21, 2025
11936d5
add generate expert map
qmkakaxi Jun 21, 2025
e83afa5
Merge remote-tracking branch 'origin/br_main_into_eplb' into br_main_…
qmkakaxi Jun 21, 2025
8907c9c
Merge branch 'br_main_into_eplb' into lt_dev
Jun 21, 2025
d0e8104
add generate_expert_map_all
qmkakaxi Jun 21, 2025
0c8318c
generate expert map
qmkakaxi Jun 21, 2025
ab4bfd2
init expert map
qmkakaxi Jun 21, 2025
adaed7b
Merge pull request #90 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
e6e25f3
fix bugs in get_update_iteration
qmkakaxi Jun 21, 2025
0cfd62c
Merge pull request #91 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
12f0c44
Merge branch 'br_main_into_eplb' into lt_dev
Jun 21, 2025
353150e
fix bug in get_init_expert_map_from_file
qmkakaxi Jun 21, 2025
43d4b87
Merge pull request #92 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
f6830d4
update policy = 6
Jun 22, 2025
041e141
add load_gather_iteration
raindaywhu Jun 22, 2025
f4f9fd7
add code to guarantee there is no expert movement inside a NPU
wanghanqingLYT Jun 22, 2025
7371294
Merge pull request #93 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 22, 2025
5c09eab
新增日志
Jun 22, 2025
1e6b2c6
Merge branch 'lt_dev' of https://github.com/raindaywhu/vllm-ascend in…
Jun 22, 2025
017e0aa
Update policy_factory.py
wanghanqingLYT Jun 22, 2025
976eb9f
Merge pull request #94 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 22, 2025
d537fb2
update
Jun 22, 2025
1a8d238
Merge pull request #85 from raindaywhu/lt_dev
raindaywhu Jun 22, 2025
83f2d51
Merge branch 'br_main_into_eplb' of https://github.com/raindaywhu/vll…
Jun 22, 2025
9e2cca1
dummy run not add moe load
qmkakaxi Jun 22, 2025
5d1ce50
Merge pull request #95 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 22, 2025
edb38e4
fix bug in compute moe load
qmkakaxi Jun 22, 2025
6bbdb15
Merge pull request #96 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 22, 2025
8b31e79
fix bugs in forward_end
qmkakaxi Jun 22, 2025
5225f3c
Merge pull request #97 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 22, 2025
2dba24d
Merge branch 'br_main_into_eplb' of https://github.com/raindaywhu/vll…
Jun 23, 2025
d4d0716
fix conflict
Jun 23, 2025
53e8949
fix some bug
Jun 23, 2025
98b9383
fix precision by fix a wrong branch condition in w8a8_dynamic.py
wanghanqingLYT Jun 23, 2025
a3544ce
Merge pull request #98 from raindaywhu/dev_whq_eplb3
wanghanqingLYT Jun 23, 2025
45766f6
fix code format alignment
Jun 23, 2025
6b36faf
update format
Jun 23, 2025
1a067a3
fix incident for function forward_end in eplb_updator.py
wanghanqingLYT Jun 23, 2025
fc88c4b
Merge pull request #100 from raindaywhu/dev_whq_eplb3
wanghanqingLYT Jun 23, 2025
9c329ed
optimize calculate moe load
qmkakaxi Jun 24, 2025
0897ccc
Merge pull request #101 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 24, 2025
4980f2c
fix bug in moe load & add expert load to josn
qmkakaxi Jun 24, 2025
96fe998
Merge pull request #102 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 24, 2025
da49def
merge from remote main
Jun 24, 2025
9d9c93a
update get_expert_load return type
Jun 24, 2025
162d106
fix bug when running benchmark by move forward_before behind return o…
wanghanqingLYT Jun 25, 2025
c57611c
Merge pull request #103 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 25, 2025
1f0b980
fix SwiftBalancer eplb algo
Jun 26, 2025
bfa07cf
Merge pull request #104 from raindaywhu/new_dev_main_cy
raindaywhu Jun 26, 2025
e7b7186
update get_expert_load logic
Jun 27, 2025
d018ec8
fix get_expert_load
qmkakaxi Jun 27, 2025
6a0a05e
delete invaild print
qmkakaxi Jun 27, 2025
1547810
delete empty tensor judgement
Jun 27, 2025
1b7b87b
Merge pull request #105 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 27, 2025
969751a
merge from remote default branch and fix conflict
Jun 27, 2025
b0e68f7
merge default branch and fix conflict
Jun 27, 2025
3465ad6
relocate the code from the worker_runner to the server side.
Jun 28, 2025
0bab2cd
Merge pull request #99 from raindaywhu/lt_expert_load
raindaywhu Jun 28, 2025
ad5e7e1
collect moe load after dispatch
wanghanqingLYT Jun 30, 2025
e4cba5e
Merge branch 'br_main_into_eplb' into dev_whq_eplb2
wanghanqingLYT Jun 30, 2025
75992b9
Merge pull request #106 from raindaywhu/dev_whq_eplb2
wanghanqingLYT Jun 30, 2025
89bcf04
modify serialization of eplb process
wanghanqingLYT Jul 1, 2025
cfbe8b1
Merge pull request #107 from raindaywhu/dev_whq_eplb2
wanghanqingLYT Jul 2, 2025
2b62a47
improve d2d expert weight update impl in eplb_updator.py
wanghanqingLYT Jul 3, 2025
d79ace8
Merge pull request #108 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jul 4, 2025
9b32ca4
add function take_update_info_from_eplb_process
wanghanqingLYT Jul 7, 2025
0a5b075
Merge pull request #109 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jul 7, 2025
89247c5
update
Jul 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@ version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
# Check for updates to GitHub Actions every week
interval: "weekly"
open-pull-requests-limit: 2
reviewers:
- "Yikun"
6 changes: 1 addition & 5 deletions .github/workflows/nightly_benchmarks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,7 @@
name: 'Benchmarks / Performance'
# This workflow runs nightly benchmarks for vllm-ascend.

on:
schedule:
# Run at 02:00 everyday
- cron: '00 18 * * *'

on:
workflow_dispatch:
# Allow manual triggering of the workflow

Expand Down
3 changes: 0 additions & 3 deletions .github/workflows/vllm_ascend_doctest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,6 @@ on:
- 'tests/e2e/doctests/**'
- 'tests/e2e/common.sh'
- 'tests/e2e/run_doctests.sh'
schedule:
# Runs every 4 hours
- cron: '0 */4 * * *'

# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
# declared as "shell: bash -el {0}" on steps that need to be properly activated.
Expand Down
251 changes: 68 additions & 183 deletions .github/workflows/vllm_ascend_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@
name: 'test'

on:
schedule:
- cron: '0 23 * * *'
pull_request:
branches:
- 'main'
Expand All @@ -44,12 +42,6 @@ defaults:
run:
shell: bash -el {0}

# only cancel in-progress runs of the same workflow
# and ignore the lint / 1 card / 4 cards test type
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
lint:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -114,171 +106,32 @@ jobs:
echo "::add-matcher::.github/workflows/matchers/mypy.json"
tools/mypy.sh 1 ${{ matrix.python-version }}

ut:
needs: [lint]
name: unit test
if: ${{ needs.lint.result == 'success' }}
runs-on: ubuntu-latest
container:
image: m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10
env:
VLLM_LOGGING_LEVEL: ERROR
VLLM_USE_MODELSCOPE: True
strategy:
matrix:
vllm_version: [main, v0.9.1]
steps:
- name: Install packages
run: |
apt-get update -y
apt-get install -y python3-pip git vim wget net-tools gcc g++ cmake libnuma-dev

- name: Checkout vllm-project/vllm repo
uses: actions/checkout@v4
with:
repository: vllm-project/vllm
ref: ${{ matrix.vllm_version }}
path: ./vllm-empty

- name: Install vllm-project/vllm from source
working-directory: ./vllm-empty
run: |
VLLM_TARGET_DEVICE=empty python3 -m pip install . --extra-index https://download.pytorch.org/whl/cpu/
python3 -m pip uninstall -y triton

- name: Checkout vllm-project/vllm-ascend repo
uses: actions/checkout@v4

- name: Install vllm-project/vllm-ascend
run: |
export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/devlib
python3 -m pip install -r requirements-dev.txt --extra-index https://download.pytorch.org/whl/cpu/
python3 -m pip install -v . --extra-index https://download.pytorch.org/whl/cpu/

- name: Run unit test for V1 Engine
env:
VLLM_USE_V1: 1
VLLM_WORKER_MULTIPROC_METHOD: spawn
TORCH_DEVICE_BACKEND_AUTOLOAD: 0
run: |
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/devlib
pytest -sv tests/ut

e2e:
needs: [lint]
if: ${{ needs.lint.result == 'success' }}
strategy:
max-parallel: 2
matrix:
os: [linux-arm64-npu-1]
vllm_version: [main, v0.9.1]
name: singlecard e2e test
runs-on: ${{ matrix.os }}
container:
# TODO(yikun): Remove m.daocloud.io prefix when infra proxy ready
image: m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10
env:
VLLM_LOGGING_LEVEL: ERROR
steps:
- name: Check npu and CANN info
run: |
npu-smi info
cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info

- name: Config mirrors
run: |
sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
apt-get update -y
apt install git -y
git config --global url."https://gh-proxy.test.osinfra.cn/https://github.com/".insteadOf https://github.com/

- name: Checkout vllm-project/vllm-ascend repo
uses: actions/checkout@v4

- name: Install system dependencies
run: |
apt-get -y install `cat packages.txt`
apt-get -y install gcc g++ cmake libnuma-dev

- name: Checkout vllm-project/vllm repo
uses: actions/checkout@v4
with:
repository: vllm-project/vllm
ref: ${{ matrix.vllm_version }}
path: ./vllm-empty

- name: Install vllm-project/vllm from source
working-directory: ./vllm-empty
run: |
VLLM_TARGET_DEVICE=empty pip install -e .

- name: Install vllm-project/vllm-ascend
env:
PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
run: |
pip install -r requirements-dev.txt
pip install -v -e .

- name: Run e2e test for V1 Engine
env:
VLLM_USE_V1: 1
VLLM_WORKER_MULTIPROC_METHOD: spawn
VLLM_USE_MODELSCOPE: True
run: |
pytest -sv tests/e2e/singlecard/test_offline_inference.py
# TODO: switch hf to modelscope
VLLM_USE_MODELSCOPE=False HF_ENDPOINT=https://hf-mirror.com \
pytest -sv tests/e2e/singlecard/test_ilama_lora.py
# TODO(sss): guided decoding doesn't work, fix it later
# pytest -sv tests/e2e/singlecard/test_guided_decoding.py
pytest -sv tests/e2e/singlecard/test_camem.py
pytest -sv tests/e2e/singlecard/ \
--ignore=tests/e2e/singlecard/test_offline_inference.py \
--ignore=tests/e2e/singlecard/test_ilama_lora.py \
--ignore=tests/e2e/singlecard/test_guided_decoding.py \
--ignore=tests/e2e/singlecard/test_camem.py

- name: Run e2e test on V0 engine
if: ${{ github.event_name == 'schedule' }}
env:
VLLM_USE_V1: 0
VLLM_USE_MODELSCOPE: True
run: |
pytest -sv tests/e2e/singlecard/test_offline_inference.py
# TODO: switch hf to modelscope
VLLM_USE_MODELSCOPE=False HF_ENDPOINT=https://hf-mirror.com \
pytest -sv tests/e2e/singlecard/test_ilama_lora.py
# guided decoding doesn't work, fix it later
# pytest -sv tests/e2e/singlecard/test_guided_decoding.py
pytest -sv tests/e2e/singlecard/test_camem.py
pytest -sv tests/e2e/singlecard/test_prompt_embedding.py
pytest -sv tests/e2e/singlecard/ \
--ignore=tests/e2e/singlecard/test_offline_inference.py \
--ignore=tests/e2e/singlecard/test_ilama_lora.py \
--ignore=tests/e2e/singlecard/test_guided_decoding.py \
--ignore=tests/e2e/singlecard/test_camem.py \
--ignore=tests/e2e/singlecard/test_prompt_embedding.py \
--ignore=tests/e2e/singlecard/core/test_ascend_scheduler.py \
--ignore=tests/e2e/singlecard/core/test_ascend_scheduler_e2e.py

e2e-4-cards:
needs: [e2e]
if: ${{ needs.e2e.result == 'success' }}
strategy:
max-parallel: 1
matrix:
os: [linux-arm64-npu-4]
vllm_version: [main, v0.9.1]
name: multicard e2e test
os: [linux-arm64-npu-1, linux-arm64-npu-4]
vllm_version: [v0.9.1]
concurrency:
group: >
${{
matrix.os == 'linux-arm64-npu-4'
&& github.event.pull_request.number
&& format('pr-{0}-limit-npu-4', github.event.pull_request.number)
|| format('job-{0}-{1}-{2}', matrix.os, matrix.vllm_version, github.event.pull_request.number)
}}
cancel-in-progress: false
name: vLLM Ascend test
runs-on: ${{ matrix.os }}
container:
# TODO(yikun): Remove m.daocloud.io prefix when infra proxy ready
image: m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10
env:
HF_ENDPOINT: https://hf-mirror.com
HF_TOKEN: ${{ secrets.HF_TOKEN }}
VLLM_LOGGING_LEVEL: ERROR
VLLM_USE_MODELSCOPE: True
steps:
- name: Check npu and CANN info
run: |
Expand Down Expand Up @@ -324,32 +177,64 @@ jobs:
env:
VLLM_USE_V1: 1
VLLM_WORKER_MULTIPROC_METHOD: spawn
VLLM_USE_MODELSCOPE: True
run: |
# TODO: switch hf to modelscope
VLLM_USE_MODELSCOPE=False HF_ENDPOINT=https://hf-mirror.com \
pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py
# Fixme: run VLLM_USE_MODELSCOPE=True pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py will raise error.
# To avoid oom, we need to run the test in a single process.
pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ
pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek
pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_topk
pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek_W8A8
pytest -sv tests/e2e/multicard/ --ignore=tests/e2e/multicard/test_ilama_lora_tp2.py --ignore=tests/e2e/multicard/test_offline_inference_distributed.py
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
VLLM_USE_MODELSCOPE=True pytest -sv tests/singlecard/test_offline_inference.py
# guided decoding doesn't work, fix it later
# pytest -sv tests/singlecard/test_guided_decoding.py.py
# test_ascend_config.py should be ran separately because it will regenerate the global config many times.
pytest -sv tests/singlecard/test_ascend_config.py
pytest -sv tests/singlecard/test_camem.py
pytest -sv tests/singlecard/core/test_ascend_scheduler.py
pytest -sv tests/singlecard/core/test_ascend_scheduler_e2e.py
pytest -sv tests/singlecard/ \
--ignore=tests/singlecard/test_offline_inference.py \
--ignore=tests/singlecard/test_guided_decoding.py \
--ignore=tests/singlecard/test_ascend_config.py \
--ignore=tests/singlecard/test_camem.py \
--ignore=tests/singlecard/core/test_ascend_scheduler.py \
--ignore=tests/singlecard/core/test_ascend_scheduler_e2e.py
else
pytest -sv tests/multicard/test_ilama_lora_tp2.py
# To avoid oom, we need to run the test in a single process.
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_w4a8_deepseek.py::test_deepseek_W4A8
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_topk
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek_W8A8
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek_dbo
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeekV3_dbo
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py --ignore=tests/multicard/test_offline_inference_distributed.py --ignore=tests/multicard/test_w4a8_deepseek.py
fi

- name: Run vllm-project/vllm-ascend test on V0 engine
if: ${{ github.event_name == 'schedule' }}
env:
VLLM_USE_V1: 0
VLLM_USE_MODELSCOPE: True
run: |
# TODO: switch hf to modelscope
VLLM_USE_MODELSCOPE=False HF_ENDPOINT=https://hf-mirror.com \
pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py
# Fixme: run VLLM_USE_MODELSCOPE=True pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py will raise error.
# To avoid oom, we need to run the test in a single process.
pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ
pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek
pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_topk
pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek_W8A8
pytest -sv tests/e2e/multicard/ --ignore=tests/e2e/multicard/test_ilama_lora_tp2.py --ignore=tests/e2e/multicard/test_offline_inference_distributed.py
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
VLLM_USE_MODELSCOPE=True pytest -sv tests/singlecard/test_offline_inference.py
# guided decoding doesn't work, fix it later
# pytest -sv tests/singlecard/test_guided_decoding.py.py
pytest -sv tests/singlecard/test_camem.py
# test_ascend_config.py should be ran separately because it will regenerate the global config many times.
pytest -sv tests/singlecard/test_ascend_config.py
pytest -sv tests/singlecard/test_prompt_embedding.py
pytest -sv tests/singlecard/ \
--ignore=tests/singlecard/test_offline_inference.py \
--ignore=tests/singlecard/test_guided_decoding.py \
--ignore=tests/singlecard/test_camem.py \
--ignore=tests/singlecard/test_ascend_config.py \
--ignore=tests/singlecard/test_prompt_embedding.py \
--ignore=tests/singlecard/core/test_ascend_scheduler.py \
--ignore=tests/singlecard/core/test_ascend_scheduler_e2e.py
else
pytest -sv tests/multicard/test_ilama_lora_tp2.py
# Fixme: run VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py will raise error.
# To avoid oom, we need to run the test in a single process.
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_topk
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek_W8A8
VLLM_USE_MODELSCOPE=True pytest -sv tests/multicard/ --ignore=tests/multicard/test_ilama_lora_tp2.py --ignore=tests/multicard/test_offline_inference_distributed.py
fi
23 changes: 12 additions & 11 deletions .github/workflows/vllm_ascend_test_long_term.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,6 @@
name: 'e2e test / long-term-test'

on:
schedule:
# Runs at 23:00 UTC (7:00 AM Beijing) every day
- cron: '0 23 * * *'
pull_request:
types: [ labeled ]

Expand All @@ -43,7 +40,7 @@ jobs:
max-parallel: 2
matrix:
os: [linux-arm64-npu-1, linux-arm64-npu-4]
vllm_version: [main, v0.9.1]
vllm_version: [v0.9.1]
name: vLLM Ascend long term test
runs-on: ${{ matrix.os }}
container:
Expand Down Expand Up @@ -97,13 +94,17 @@ jobs:
- name: Run vllm-project/vllm-ascend long term test
run: |
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
# spec decode test
VLLM_USE_MODELSCOPE=True pytest -sv tests/e2e/long_term/spec_decode/e2e/test_v1_mtp_correctness.py
# v0 spec decode test
# VLLM_USE_MODELSCOPE=True pytest -sv tests/long_term/spec_decode_v0/e2e/test_mtp_correctness.py # it needs a clean process
# pytest -sv tests/long_term/spec_decode_v0 --ignore=tests/long_term/spec_decode_v0/e2e/test_mtp_correctness.py
# v1 spec decode test
# TODO: revert me when test_v1_mtp_correctness.py is fixed
VLLM_USE_MODELSCOPE=True pytest -sv tests/long_term/spec_decode_v1/test_v1_mtp_correctness.py
# TODO: revert me when test_v1_spec_decode.py::test_ngram_correctness is fixed
# VLLM_USE_MODELSCOPE=True pytest -sv tests/e2e/long_term/spec_decode/e2e/test_v1_spec_decode.py
VLLM_USE_MODELSCOPE=True pytest -sv tests/e2e/long_term/spec_decode/e2e/test_mtp_correctness.py # it needs a clean process
pytest -sv tests/e2e/long_term/spec_decode --ignore=tests/e2e/long_term/spec_decode/e2e/test_mtp_correctness.py --ignore=tests/e2e/long_term/spec_decode/e2e/test_v1_spec_decode.py --ignore=tests/e2e/long_term/spec_decode/e2e/test_v1_mtp_correctness.py
pytest -sv tests/e2e/long_term/test_accuracy.py
# VLLM_USE_MODELSCOPE=True pytest -sv tests/long_term/spec_decode_v1/test_v1_spec_decode.py
# accuracy test single card
pytest -sv tests/long_term/test_accuracy.py
else
VLLM_USE_MODELSCOPE=True pytest -sv tests/e2e/long_term/test_deepseek_v2_lite_tp2_accuracy.py
# accuracy test multi card
VLLM_USE_MODELSCOPE=True pytest -sv tests/long_term/test_deepseek_v2_lite_tp2_accuracy.py
fi
9 changes: 5 additions & 4 deletions .github/workflows/vllm_ascend_test_pd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,6 @@
name: 'e2e test / pd-disaggregation'

on:
schedule:
# Runs at 23:00 UTC (7:00 AM Beijing) every day
- cron: '0 23 * * *'
pull_request:
types: [ labeled ]

Expand All @@ -41,7 +38,7 @@ jobs:
if: ${{ contains(github.event.pull_request.labels.*.name, 'pd-test') && contains(github.event.pull_request.labels.*.name, 'ready-for-test') || github.event_name == 'schedule' }}
strategy:
matrix:
vllm_verison: [main, v0.9.1]
vllm_verison: [v0.9.1]
name: vLLM Ascend prefilling decoding disaggregation test
runs-on: linux-arm64-npu-static-8

Expand Down Expand Up @@ -106,3 +103,7 @@ jobs:
- name: Run vllm-project/vllm-ascend PD Disaggregation test
run: |
pytest -sv tests/e2e/pd_disaggreate/test_pd_e2e.py

- name: Run vllm-project/vllm-ascend PD Disaggregation edge test
run: |
bash tests/e2e/pd_disaggreate/run_edge_case_test.sh
Loading
Loading