Skip to content

New 091 #1658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 236 commits into from
Closed

New 091 #1658

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
236 commits
Select commit Hold shift + click to select a range
5cd5d64
[CI] remove old quantization model (#1003)
22dimensions Jun 10, 2025
71aee6f
Update 0.9.0rc1 contributors info (#1148)
Yikun Jun 10, 2025
e68e81f
[CI] Make accuarcy CI and report work (#1078)
zhangxinyuehfad Jun 10, 2025
b75cb78
[Bugfix] add compilation/__init__.py to fix import error (#1152)
wangxiyuan Jun 10, 2025
95414ba
[CI] Run e2e after pre check pass (#1132)
wangxiyuan Jun 10, 2025
2a94321
add eplb policy
Jun 10, 2025
e91956f
add eplb updator
Jun 10, 2025
66b3d2e
implementation of VllmEplbAdaptor and D2DExpertWeightLoader
wanghanqingLYT Jun 10, 2025
05a536c
add eplb policy and updator
raindaywhu Jun 10, 2025
24ca412
Merge pull request #39 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 10, 2025
86fe2c0
determine num_dense_layers and num_moe_layers by refering to model co…
wanghanqingLYT Jun 10, 2025
caeaf2c
Merge pull request #41 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 10, 2025
e68e522
EPLB add eplb_worker
qmkakaxi Jun 10, 2025
f450936
Merge pull request #42 from raindaywhu/dev_mereg_wjh
qmkakaxi Jun 10, 2025
d639144
add ssd loader
qmkakaxi Jun 10, 2025
f1f936b
EPLB moed load collect
qmkakaxi Jun 10, 2025
bd924f2
delete invalida import
qmkakaxi Jun 10, 2025
7e9bb54
Merge pull request #43 from raindaywhu/dev_mereg_wjh
qmkakaxi Jun 10, 2025
291c216
fix torchair execute issue on padding data, and mtp padding logic (#1…
ganyi1996ppo Jun 10, 2025
8dd686d
[MLA][Graph] Improve assertion on Graph mode with MLA (#933)
MengqingCao Jun 10, 2025
8b48daa
[CI] rename Qwen2.5-0.5B-Instruct-W8A8 model (#1145)
22dimensions Jun 10, 2025
04abfd8
[CI] Skip test_v1_spec_decode.py::test_ngram_correctness to make long…
MengqingCao Jun 10, 2025
7bdc606
Support multistream of shared experts in FusedMoE (#997)
sdmyzlp Jun 11, 2025
860a5ef
provide an e2e guide for execute duration profiling (#1113)
depeng1994 Jun 11, 2025
980cd81
etp best a2 (#1101)
ttanzhiqiang Jun 11, 2025
4153a50
[Doc] Fix the config parameter name "enable" in graph_mode.md. (#1159)
yzim Jun 11, 2025
e46dc14
Enable kvcache_nz for the decode process in torchair graph mode (#1098)
chenwaner Jun 11, 2025
4f59644
[CI] Upgrade vllm to 0.9.1 (#1165)
wangxiyuan Jun 11, 2025
afcce8e
fix bugs in fused_experts_with_all2all
wanghanqingLYT Jun 11, 2025
bca9b34
Merge pull request #44 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 11, 2025
cc88ea7
add eplb tabel generator
Jun 11, 2025
f2d0a75
add eplb tabel generator
raindaywhu Jun 11, 2025
78079a7
Adapt static EPLB
qmkakaxi Jun 11, 2025
22f03db
Merge branch 'master' into br_wjh_eplb
qmkakaxi Jun 11, 2025
485e3d0
add enable_eplb in ascend_config
qmkakaxi Jun 11, 2025
3393d53
[Scheduler][MTP] Add support for speculative decoding in AsecendSched…
whx-sjtu Jun 11, 2025
9e5e117
enable_eplb -> dynamic_eplb
qmkakaxi Jun 11, 2025
2498d29
add custom ascendc kernel vocabparallelembedding (#796)
ttanzhiqiang Jun 12, 2025
dd207cb
[CI][Benchmark] Add new model and v1 test to perf benchmarks (#1099)
Potabk Jun 12, 2025
37f4469
[CI][Benchmark] Add qwen2.5-7b test (#1104)
Potabk Jun 12, 2025
c6e2a5f
[fix] fix bug in 1p1d disaggregated_prefill example (#1184)
wangyanhui-cmss Jun 12, 2025
55c0e68
[Doc] Add Referer header for CANN package download url. (#1192)
wonderful199082 Jun 12, 2025
e72f94e
Support multistream of MLA vector operations (#1135)
sdmyzlp Jun 12, 2025
47b507b
[CI] Recover ut for ascend scheduler only in ci of v1. (#1180)
whx-sjtu Jun 12, 2025
94a52cf
Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer (#1203)
Yikun Jun 13, 2025
c28f6cb
add eplb policy
Jun 10, 2025
839cab1
add eplb updator
Jun 10, 2025
6ad801d
implementation of VllmEplbAdaptor and D2DExpertWeightLoader
wanghanqingLYT Jun 10, 2025
34109ac
determine num_dense_layers and num_moe_layers by refering to model co…
wanghanqingLYT Jun 10, 2025
c15f8a8
EPLB add eplb_worker
qmkakaxi Jun 10, 2025
1400ea3
add ssd loader
qmkakaxi Jun 10, 2025
28af393
EPLB moed load collect
qmkakaxi Jun 10, 2025
aa619f4
delete invalida import
qmkakaxi Jun 10, 2025
6fc343c
fix bugs in fused_experts_with_all2all
wanghanqingLYT Jun 11, 2025
b97e066
Adapt static EPLB
qmkakaxi Jun 11, 2025
474a5c3
add eplb tabel generator
Jun 11, 2025
807348f
add enable_eplb in ascend_config
qmkakaxi Jun 11, 2025
85d29c5
enable_eplb -> dynamic_eplb
qmkakaxi Jun 11, 2025
e4172aa
fix bugs in dynamioc eplb
wanghanqingLYT Jun 14, 2025
264a3a5
delete print in funsed_moe forward
wanghanqingLYT Jun 14, 2025
7334158
Merge pull request #52 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
a3b5af8
[CI/UT][Graph] Add ut for torchair graph mode (#1103)
MengqingCao Jun 14, 2025
a5dd04f
fix bugs caused by variable name old_placemet
wanghanqingLYT Jun 14, 2025
60c87b0
Merge pull request #53 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
241b722
move get_init_expert_map to forward_before
wanghanqingLYT Jun 14, 2025
b888701
Merge pull request #54 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
441508c
fix bug in log2phy in dynamic w8a8
wanghanqingLYT Jun 14, 2025
627757e
Merge pull request #55 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
ab5d110
vllm-ascend support chunked prefill (#1172)
fems14 Jun 14, 2025
88bb99d
fix bug for dim of updated_log2phy_map
wanghanqingLYT Jun 14, 2025
3913395
Merge pull request #56 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 14, 2025
0d2074a
[Doc] fix VLLM_USE_V1 value in graph mode docs (#1226)
22dimensions Jun 15, 2025
4270682
Waiting for BMM NZ support(Improve TPOP 2ms performance) (#1131)
ttanzhiqiang Jun 15, 2025
4ce860a
[CI] Make e2e test to be preemptible and simple (#1217)
Yikun Jun 15, 2025
966557a
[Build] Speedup image build (#1216)
Yikun Jun 16, 2025
dd55ce2
Merge remote-tracking branch 'origin/br_whq_eplb_main' into br_wjh_eplb
qmkakaxi Jun 16, 2025
e86964e
add dynamic_ep alg.
qmkakaxi Jun 16, 2025
9d1893a
fxi bugs
qmkakaxi Jun 16, 2025
230fd9c
Merge pull request #57 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 16, 2025
0992bee
fix eplb update log
raindaywhu Jun 16, 2025
91ff797
Merge pull request #59 from raindaywhu/cy_eplb
raindaywhu Jun 16, 2025
69b817e
[CI] Add unit test framework (#1201)
wangxiyuan Jun 16, 2025
0c6210f
fix bugsw
qmkakaxi Jun 16, 2025
3b7fd9b
Merge pull request #60 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 16, 2025
f5404dc
Fix the device error when using ray as vllm-acend backend (#884)
zhuo97 Jun 16, 2025
9d63949
improve the implement of communication between main process and eplb …
wanghanqingLYT Jun 16, 2025
0269ef6
Merge pull request #61 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 16, 2025
96fa7ff
[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.…
MengqingCao Jun 16, 2025
9d3cbc0
[Doctest] add installation doctest (#1179)
Yikun Jun 17, 2025
05dec7e
[Doc] Refactor and init user story page (#1224)
Yikun Jun 17, 2025
ae01e08
add compose_expert_update_info_bipartite
qmkakaxi Jun 17, 2025
4b5cd84
adapt compose_expert_update_info_bipartite into eplb process
wanghanqingLYT Jun 17, 2025
74fe5ff
Merge branch 'br_whq_eplb_main' into br_wjh_eplb
wanghanqingLYT Jun 17, 2025
b7bfcc9
Merge pull request #62 from raindaywhu/br_wjh_eplb
wanghanqingLYT Jun 17, 2025
d5dc946
fix bugsw
qmkakaxi Jun 16, 2025
86be76f
improve the implement of communication between main process and eplb …
wanghanqingLYT Jun 16, 2025
03abde3
move generate log2ph map to eplb_worker
raindaywhu Jun 17, 2025
23ca68d
[refactor] Refactoring AscendFusedMoE (#1229)
zzzzwwjj Jun 17, 2025
7f77443
fix bugsw
qmkakaxi Jun 16, 2025
0b8d00a
improve the implement of communication between main process and eplb …
wanghanqingLYT Jun 16, 2025
447360f
avoid frequetly synchronize between device and cpu when accessing to …
wanghanqingLYT Jun 17, 2025
acf2aee
Merge pull request #63 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 17, 2025
40ee72f
add gate for cacluate moe load
qmkakaxi Jun 17, 2025
baacad8
Merge pull request #64 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 17, 2025
4380fdd
fix log2phy
raindaywhu Jun 17, 2025
556169d
Merge branch 'br_whq_eplb_main' into cy_eplb
raindaywhu Jun 17, 2025
347f60c
Merge branch 'br_whq_eplb_main' into cy_eplb
raindaywhu Jun 17, 2025
49efd9b
fix bugs in expert_map_per_layer_cpu
wanghanqingLYT Jun 17, 2025
352dbca
Merge pull request #66 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 17, 2025
d18aef1
fix log2phy
raindaywhu Jun 17, 2025
d6a76f8
Merge branch 'br_whq_eplb_main' into cy_eplb
raindaywhu Jun 17, 2025
53d0218
fix log2phy
Jun 17, 2025
af10b4a
mv log2phy into eplb worker
raindaywhu Jun 17, 2025
f802994
[Bugfix] Remove cuda related lines and add additional pip mirror (#1252)
Potabk Jun 17, 2025
b39b6d2
Merge pull request #65 from raindaywhu/cy_eplb
raindaywhu Jun 17, 2025
afc8edb
[Bugfix]: Pass scaling args to mc2 (#1202)
jianzs Jun 17, 2025
d7e19ed
[BugFix] fix length of sin/cos cache in rope (#1266)
whx-sjtu Jun 17, 2025
1193c97
default 10 turns to wait worker finished
raindaywhu Jun 17, 2025
aa1660e
Merge pull request #67 from raindaywhu/cy_eplb
raindaywhu Jun 17, 2025
db2f630
[bugfix] fix deepseek with mc2 (#1268)
zzzzwwjj Jun 17, 2025
78b7480
fix bug in compose_expert_update_info_bipartite when adding node
wanghanqingLYT Jun 18, 2025
1d9b011
Merge pull request #68 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 18, 2025
2cd8ecd
[Bugfix][Spec Decode] Enable `ACL_OP_INIT_MODE=1` directly only when …
shen-shanshan Jun 18, 2025
8e6b1ee
improve running time in generate_expert_d2d_transfer_task
wanghanqingLYT Jun 18, 2025
6d845f2
Merge pull request #69 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 18, 2025
ebb2a70
static EPLB fix bug, add unit test (#1186)
songshanhu07 Jun 18, 2025
43def8a
add warm up & batch add
qmkakaxi Jun 18, 2025
130bbb9
Merge pull request #70 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 18, 2025
9219cc8
delete layer moe load
qmkakaxi Jun 18, 2025
c600494
Merge pull request #71 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 18, 2025
2125fe0
add get_tok_ids
qmkakaxi Jun 18, 2025
2403b59
Merge pull request #72 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 18, 2025
4bda9ba
Extract cal_moe_load from deepseek_v2
qmkakaxi Jun 18, 2025
2e824cd
Merge pull request #73 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 18, 2025
1b78fb2
running time reduction forward_before and forward_end
wanghanqingLYT Jun 19, 2025
53728f3
Merge pull request #74 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 19, 2025
e4b1ba0
packed update info and put/get
qmkakaxi Jun 19, 2025
1c8edad
Merge pull request #75 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 19, 2025
a9584dd
add get expert workload
Jun 19, 2025
6592d72
fix bug in pack update info
qmkakaxi Jun 19, 2025
e6a3851
Merge pull request #76 from raindaywhu/br_wjh_eplb
qmkakaxi Jun 19, 2025
17fc31e
improve implementation of generate_log2phy_map
wanghanqingLYT Jun 19, 2025
082e82d
Merge pull request #77 from raindaywhu/dev_whq_eplb
wanghanqingLYT Jun 19, 2025
b350eda
[UT] refactor test_expert_load_balancer and fix broken CI (#1293)
wangxiyuan Jun 19, 2025
926de75
Merge remote-tracking branch 'vllm_main/main' into br_main_into_eplb
qmkakaxi Jun 20, 2025
22de4ee
fix warm up & change init expert map from file
qmkakaxi Jun 20, 2025
e83f89d
add moe load in worker_v1
qmkakaxi Jun 20, 2025
3604f04
Merge remote-tracking branch 'vllm_main/main' into br_main_into_eplb
qmkakaxi Jun 20, 2025
2484055
Merge pull request #78 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 20, 2025
1a21f30
fix warm up bugs
qmkakaxi Jun 20, 2025
38c1234
Merge pull request #79 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 20, 2025
051f77a
fix log2phy bug
qmkakaxi Jun 20, 2025
7b6b474
Merge pull request #80 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 20, 2025
d0c98c9
fix bugs: batch_isend_irecv synchronization and dtype bug in log2phy
wanghanqingLYT Jun 20, 2025
6226dee
Merge pull request #81 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 20, 2025
9295a9c
add another check for new placement generated by eplb algorithm
wanghanqingLYT Jun 21, 2025
67fa706
add dynamic_ep_v2
qmkakaxi Jun 21, 2025
2ccda78
Merge pull request #83 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
7a11221
Merge pull request #82 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 21, 2025
771d4c7
fix dummy_run and profile_run
Jun 21, 2025
ff1076f
Merge pull request #84 from raindaywhu/cy_br_main_into_eplb
raindaywhu Jun 21, 2025
da27c2d
add mock experts_load data
Jun 21, 2025
70a922e
fix bugs in get_init_expert_map_from_file
wanghanqingLYT Jun 21, 2025
89f4376
Merge pull request #86 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 21, 2025
af31373
fix bug in init expert_map_per_layer_cpu
wanghanqingLYT Jun 21, 2025
5751c27
Merge pull request #87 from raindaywhu/dev_whq_eplb2
wanghanqingLYT Jun 21, 2025
613c030
add gate_eplb
qmkakaxi Jun 21, 2025
a2505fb
Merge pull request #88 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
057a297
get_init_experts_map in warm up
qmkakaxi Jun 21, 2025
62108d7
Merge pull request #89 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
9f498e9
add update_expert_load_statistical_period logic
Jun 21, 2025
11936d5
add generate expert map
qmkakaxi Jun 21, 2025
e83afa5
Merge remote-tracking branch 'origin/br_main_into_eplb' into br_main_…
qmkakaxi Jun 21, 2025
8907c9c
Merge branch 'br_main_into_eplb' into lt_dev
Jun 21, 2025
d0e8104
add generate_expert_map_all
qmkakaxi Jun 21, 2025
0c8318c
generate expert map
qmkakaxi Jun 21, 2025
ab4bfd2
init expert map
qmkakaxi Jun 21, 2025
adaed7b
Merge pull request #90 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
e6e25f3
fix bugs in get_update_iteration
qmkakaxi Jun 21, 2025
0cfd62c
Merge pull request #91 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
12f0c44
Merge branch 'br_main_into_eplb' into lt_dev
Jun 21, 2025
353150e
fix bug in get_init_expert_map_from_file
qmkakaxi Jun 21, 2025
43d4b87
Merge pull request #92 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 21, 2025
f6830d4
update policy = 6
Jun 22, 2025
041e141
add load_gather_iteration
raindaywhu Jun 22, 2025
f4f9fd7
add code to guarantee there is no expert movement inside a NPU
wanghanqingLYT Jun 22, 2025
7371294
Merge pull request #93 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 22, 2025
5c09eab
新增日志
Jun 22, 2025
1e6b2c6
Merge branch 'lt_dev' of https://github.com/raindaywhu/vllm-ascend in…
Jun 22, 2025
017e0aa
Update policy_factory.py
wanghanqingLYT Jun 22, 2025
976eb9f
Merge pull request #94 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 22, 2025
d537fb2
update
Jun 22, 2025
1a8d238
Merge pull request #85 from raindaywhu/lt_dev
raindaywhu Jun 22, 2025
83f2d51
Merge branch 'br_main_into_eplb' of https://github.com/raindaywhu/vll…
Jun 22, 2025
9e2cca1
dummy run not add moe load
qmkakaxi Jun 22, 2025
5d1ce50
Merge pull request #95 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 22, 2025
edb38e4
fix bug in compute moe load
qmkakaxi Jun 22, 2025
6bbdb15
Merge pull request #96 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 22, 2025
8b31e79
fix bugs in forward_end
qmkakaxi Jun 22, 2025
5225f3c
Merge pull request #97 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 22, 2025
2dba24d
Merge branch 'br_main_into_eplb' of https://github.com/raindaywhu/vll…
Jun 23, 2025
d4d0716
fix conflict
Jun 23, 2025
53e8949
fix some bug
Jun 23, 2025
98b9383
fix precision by fix a wrong branch condition in w8a8_dynamic.py
wanghanqingLYT Jun 23, 2025
a3544ce
Merge pull request #98 from raindaywhu/dev_whq_eplb3
wanghanqingLYT Jun 23, 2025
45766f6
fix code format alignment
Jun 23, 2025
6b36faf
update format
Jun 23, 2025
1a067a3
fix incident for function forward_end in eplb_updator.py
wanghanqingLYT Jun 23, 2025
fc88c4b
Merge pull request #100 from raindaywhu/dev_whq_eplb3
wanghanqingLYT Jun 23, 2025
9c329ed
optimize calculate moe load
qmkakaxi Jun 24, 2025
0897ccc
Merge pull request #101 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 24, 2025
4980f2c
fix bug in moe load & add expert load to josn
qmkakaxi Jun 24, 2025
96fe998
Merge pull request #102 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 24, 2025
da49def
merge from remote main
Jun 24, 2025
9d9c93a
update get_expert_load return type
Jun 24, 2025
162d106
fix bug when running benchmark by move forward_before behind return o…
wanghanqingLYT Jun 25, 2025
c57611c
Merge pull request #103 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jun 25, 2025
1f0b980
fix SwiftBalancer eplb algo
Jun 26, 2025
bfa07cf
Merge pull request #104 from raindaywhu/new_dev_main_cy
raindaywhu Jun 26, 2025
e7b7186
update get_expert_load logic
Jun 27, 2025
d018ec8
fix get_expert_load
qmkakaxi Jun 27, 2025
6a0a05e
delete invaild print
qmkakaxi Jun 27, 2025
1547810
delete empty tensor judgement
Jun 27, 2025
1b7b87b
Merge pull request #105 from raindaywhu/br_main_into_eplb_wjh
qmkakaxi Jun 27, 2025
969751a
merge from remote default branch and fix conflict
Jun 27, 2025
b0e68f7
merge default branch and fix conflict
Jun 27, 2025
3465ad6
relocate the code from the worker_runner to the server side.
Jun 28, 2025
0bab2cd
Merge pull request #99 from raindaywhu/lt_expert_load
raindaywhu Jun 28, 2025
ad5e7e1
collect moe load after dispatch
wanghanqingLYT Jun 30, 2025
e4cba5e
Merge branch 'br_main_into_eplb' into dev_whq_eplb2
wanghanqingLYT Jun 30, 2025
75992b9
Merge pull request #106 from raindaywhu/dev_whq_eplb2
wanghanqingLYT Jun 30, 2025
89bcf04
modify serialization of eplb process
wanghanqingLYT Jul 1, 2025
cfbe8b1
Merge pull request #107 from raindaywhu/dev_whq_eplb2
wanghanqingLYT Jul 2, 2025
2b62a47
improve d2d expert weight update impl in eplb_updator.py
wanghanqingLYT Jul 3, 2025
d79ace8
Merge pull request #108 from raindaywhu/dev_whq_eplb1
wanghanqingLYT Jul 4, 2025
3dc10ef
merge update
Jul 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@ version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
# Check for updates to GitHub Actions every week
interval: "weekly"
open-pull-requests-limit: 2
reviewers:
- "Yikun"
224 changes: 149 additions & 75 deletions .github/workflows/accuracy_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,110 +19,184 @@ name: Accuracy Report
on:
workflow_dispatch:
inputs:
branch:
description: 'choose a dev branch to pr'
vllm-ascend-branch:
description: 'vllm-ascend branch:'
required: true
vllm-ascend-version:
description: 'what vllm-ascend version to accuracy test?'
type: choice
options:
- main
- v0.7.3-dev
models:
description: 'models:'
required: true
type: string
type: choice
options:
- all
- Qwen/Qwen2.5-7B-Instruct
- Qwen/Qwen2.5-VL-7B-Instruct
- Qwen/Qwen3-8B-Base
default: 'all'

jobs:
download:
download_reports:
runs-on: ubuntu-latest
strategy:
matrix:
model: ${{ fromJSON(
(github.event.inputs.models == 'all' &&
'["Qwen/Qwen2.5-7B-Instruct","Qwen/Qwen2.5-VL-7B-Instruct","Qwen/Qwen3-8B-Base"]') ||
(github.event.inputs.models == 'Qwen/Qwen2.5-7B-Instruct' &&
'["Qwen/Qwen2.5-7B-Instruct"]') ||
(github.event.inputs.models == 'Qwen/Qwen2.5-VL-7B-Instruct' &&
'["Qwen/Qwen2.5-VL-7B-Instruct"]') ||
(github.event.inputs.models == 'Qwen/Qwen3-8B-Base' &&
'["Qwen/Qwen3-8B-Base"]')
) }}

version: [0, 1]
exclude:
- model: 'Qwen/Qwen2.5-VL-7B-Instruct'
version: 1
fail-fast: false

name: Download ${{ matrix.model }} V${{ matrix.version }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}

- name: Debug List Artifacts
run: gh api /repos/${{ github.repository }}/actions/artifacts
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ref: ${{ github.event.inputs.vllm-ascend-branch }}

- name: Query artifact run id for Qwen2.5-VL-7B-Instruct V0 latest artifact
id: get_Qwen2_5_VL_7B_Instruct_latest_run_id_V0
- name: Get base model name
id: get_basename
run: |
ARTIFACT_JSON=$(gh api "repos/${{ github.repository }}/actions/artifacts")
RUN_ID=$(echo "$ARTIFACT_JSON" | \
jq -r '[.artifacts[] | select(.name=="${{ github.event.inputs.vllm-ascend-version }}-Qwen2.5-VL-7B-Instruct-V0-report")] | sort_by(.created_at) | last | .workflow_run.id')
echo "runid=$RUN_ID" >> "$GITHUB_OUTPUT"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
model_base_name=$(basename "${{ matrix.model }}")
echo "model_base_name=$model_base_name" >> $GITHUB_OUTPUT
shell: bash

- name: Query artifact run id for Qwen2.5-7B-Instruct V0 latest artifact
id: get_Qwen2_5_7B_Instruct_latest_run_id_V0
- name: Query artifact run id
id: get_run_id
run: |
ARTIFACT_JSON=$(gh api "repos/${{ github.repository }}/actions/artifacts")
ARTIFACT_PATTERN="${{ github.event.inputs.vllm-ascend-branch }}-${{ steps.get_basename.outputs.model_base_name }}-V${{ matrix.version }}-report"
echo "Querying artifacts with pattern: $ARTIFACT_PATTERN"
ARTIFACT_JSON=$(gh api --paginate /repos/${{ github.repository }}/actions/artifacts || echo "{}")
RUN_ID=$(echo "$ARTIFACT_JSON" | \
jq -r '[.artifacts[] | select(.name=="${{ github.event.inputs.vllm-ascend-version }}-Qwen2.5-7B-Instruct-V0-report")] | sort_by(.created_at) | last | .workflow_run.id')
echo "runid=$RUN_ID" >> "$GITHUB_OUTPUT"
jq -s -r --arg pattern "$ARTIFACT_PATTERN" \
'[.[].artifacts[]] | map(select(.name | test($pattern))) | sort_by(.created_at) | last | .workflow_run.id // empty')
if [ -z "$RUN_ID" ]; then
echo "::warning::No artifact found matching pattern $ARTIFACT_PATTERN. Skipping download."
echo "runid=" >> $GITHUB_OUTPUT
else
echo "Found matching artifact with run ID: $RUN_ID"
echo "runid=$RUN_ID" >> $GITHUB_OUTPUT
fi
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Query artifact run id for Qwen3-8B-Base V0 latest artifact
id: get_Qwen3_8B_Base_latest_run_id_V0
run: |
ARTIFACT_JSON=$(gh api "repos/${{ github.repository }}/actions/artifacts")
RUN_ID=$(echo "$ARTIFACT_JSON" | \
jq -r '[.artifacts[] | select(.name=="${{ github.event.inputs.vllm-ascend-version }}-Qwen3-8B-Base-V0-report")] | sort_by(.created_at) | last | .workflow_run.id')
echo "runid=$RUN_ID" >> "$GITHUB_OUTPUT"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Download Qwen/Qwen2.5-VL-7B-Instruct V0 Artifact
- name: Download Artifact
if: ${{ steps.get_run_id.outputs.runid != '' }}
uses: actions/download-artifact@v4
with:
name: ${{ github.event.inputs.vllm-ascend-version }}-Qwen2.5-VL-7B-Instruct-V0-report
path: ./docs/source/developer_guide/evaluation/accuracy_report
github-token: ${{ secrets.GITHUB_TOKEN }}
repository: vllm-project/vllm-ascend
run-id: ${{ steps.get_Qwen2_5_VL_7B_Instruct_latest_run_id_V0.outputs.runid }}
name: ${{ github.event.inputs.vllm-ascend-branch }}-${{ steps.get_basename.outputs.model_base_name }}-V${{ matrix.version }}-report
path: ./docs/source/developer_guide/evaluation/accuracy_report_bak
github-token: ${{ secrets.GITHUB_TOKEN }}
repository: ${{ github.repository }}
run-id: ${{ steps.get_run_id.outputs.runid }}

- name: Upload reports artifact
if: ${{ steps.get_run_id.outputs.runid != '' }}
uses: actions/upload-artifact@v4
with:
name: report-${{ steps.get_basename.outputs.model_base_name }}-v${{ matrix.version }}
path: ./docs/source/developer_guide/evaluation/accuracy_report_bak/*.md
retention-days: 90

- name: Download Qwen/Qwen2.5-7B-Instruct Artifact
uses: actions/download-artifact@v4
create_pr:
runs-on: ubuntu-latest
needs: download_reports
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
name: ${{ github.event.inputs.vllm-ascend-version }}-Qwen2.5-7B-Instruct-V0-report
path: ./docs/source/developer_guide/evaluation/accuracy_report
github-token: ${{ secrets.GITHUB_TOKEN }}
repository: vllm-project/vllm-ascend
run-id: ${{ steps.get_Qwen2_5_7B_Instruct_latest_run_id_V0.outputs.runid }}
ref: ${{ github.event.inputs.vllm-ascend-branch }}

- name: Setup workspace
run: mkdir -p ./accuracy/accuracy_report

- name: Download Qwen/Qwen3-8B-Base Artifact
- name: Download only current run reports
uses: actions/download-artifact@v4
with:
name: ${{ github.event.inputs.vllm-ascend-version }}-Qwen3-8B-Base-V0-report
path: ./docs/source/developer_guide/evaluation/accuracy_report
pattern: report-*
github-token: ${{ secrets.GITHUB_TOKEN }}
repository: vllm-project/vllm-ascend
run-id: ${{ steps.get_Qwen3_8B_Base_latest_run_id_V0.outputs.runid }}
run-id: ${{ github.run_id }}

- name: Delete old report
run: |
find ./docs/source/developer_guide/evaluation/accuracy_report -maxdepth 1 -type f -name '*.md' ! -name 'index.md' -delete
find ./docs/source/developer_guide/evaluation/accuracy_report -mindepth 2 -type f -name '*.md' -exec mv -f {} ./docs/source/developer_guide/evaluation/accuracy_report \;
find ./docs/source/developer_guide/evaluation/accuracy_report -mindepth 1 -type d -empty -delete
- name: Display Files
working-directory: ./docs/source/developer_guide/evaluation/accuracy_report
- name: Generate step summary
if: ${{ always() }}
run: |
cat ./Qwen2.5-VL-7B-Instruct.md
cat ./Qwen2.5-7B-Instruct.md
cat ./Qwen3-8B-Base.md
- name: Create Pull Request for markdown update
for report in ./docs/source/developer_guide/evaluation/accuracy_report/*.md; do
filename=$(basename "$report")
# skip index.md
if [ "$filename" = "index.md" ]; then
continue
fi
if [ -f "$report" ]; then
{
echo -e "\n\n---\n"
echo "## 📄 Report File: $(basename $report)"
cat "$report"
} >> "$GITHUB_STEP_SUMMARY"
fi
done
- name: Update accuracy_report/index.md
run: |
REPORT_DIR="./docs/source/developer_guide/evaluation/accuracy_report"
INDEX_MD="$REPORT_DIR/index.md"
{
echo "# Accuracy Report"
echo ""
echo "::: {toctree}"
echo ":caption: Accuracy Report"
echo ":maxdepth: 1"
for report in "$REPORT_DIR"/*.md; do
filename="$(basename "$report" .md)"
if [ "$filename" != "index" ]; then
echo "$filename"
fi
done
echo ":::"
} > "$INDEX_MD"
- name: Create Pull Request
uses: peter-evans/create-pull-request@v7
with:
token: ${{ secrets.PR_TOKEN }}
base: ${{ github.event.inputs.branch }}
branch: auto-pr/accuracy-test
commit-message: "Update accuracy report for ${{ github.event.inputs.branch }}"
base: ${{ github.event.inputs.vllm-ascend-branch }}
branch: auto-pr/accuracy-report
commit-message: "Update accuracy reports for ${{ github.event.inputs.vllm-ascend-branch }}"
add-paths: ./docs/source/developer_guide/evaluation/accuracy_report/*.md
title: "[Doc]Update accuracy report for ${{ github.event.inputs.branch }}"
title: "[Doc] Update accuracy reports for ${{ github.event.inputs.vllm-ascend-branch }}"
body: |
The accuracy results running on Ascend NPU have changed, I'm updating the report.
Please review the changes.
The accuracy results running on NPU Altlas A2 have changed, updating reports for:
${{
github.event.inputs.models == 'all'
&& 'All models (Qwen2.5-7B-Instruct, Qwen2.5-VL-7B-Instruct, Qwen3-8B-Base)'
|| github.event.inputs.models
}}
- [Workflow run][1]
- [Qwen2.5-7B-Instruct accuracy report][2]
- [Qwen2.5-VL-7B-Instruct accuracy report][3]
- [Qwen3-8B-Base accuracy report][4]
[1]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
[2]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ steps.get_Qwen2_5_7B_Instruct_latest_run_id_V0.outputs.runid }}
[3]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ steps.get_Qwen2_5_VL_7B_Instruct_latest_run_id_V0.outputs.runid }}
[4]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ steps.get_Qwen3_8B_Base_latest_run_id_V0.outputs.runid }}
[1]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
20 changes: 14 additions & 6 deletions .github/workflows/accuracy_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,7 @@ on:
# Current supported vLLM versions
options:
- main
- v0.9.0.1
- v0.9.0
- v0.9.1
- v0.7.3
vllm-ascend-version:
description: 'vllm-ascend version:'
Expand Down Expand Up @@ -96,7 +95,7 @@ jobs:
# - vl-accuracy-test: Qwen/Qwen2.5-VL-7B-Instruct
model_name: ${{ fromJSON(
(github.event.inputs.models == 'all' &&
'["Qwen/Qwen2.5-7B-Instruct","Qwen/Qwen2.5-VL-7B-Instruct","model_name":"Qwen/Qwen3-8B-Base"]') ||
'["Qwen/Qwen2.5-7B-Instruct","Qwen/Qwen2.5-VL-7B-Instruct","Qwen/Qwen3-8B-Base"]') ||
(github.event.inputs.models == 'Qwen/Qwen2.5-7B-Instruct' &&
'["Qwen/Qwen2.5-7B-Instruct"]') ||
(github.event.inputs.models == 'Qwen/Qwen2.5-VL-7B-Instruct' &&
Expand Down Expand Up @@ -159,7 +158,7 @@ jobs:
repository: vllm-project/vllm
path: ./vllm-empty
# Please also update this when bump matched version
ref: ${{ github.event.inputs.vllm-version || 'v0.9.0' }}
ref: ${{ github.event.inputs.vllm-version || 'v0.9.1' }}

- name: Install vllm-project/vllm from source
working-directory: ./vllm-empty
Expand All @@ -174,6 +173,8 @@ jobs:

- name: Install vllm-project/vllm-ascend
working-directory: ./vllm-ascend
env:
PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
run: |
pip install -r requirements-dev.txt
pip install -e .
Expand Down Expand Up @@ -201,6 +202,7 @@ jobs:
pip show torch | grep "Version:" | awk '{print "GHA_TORCH_VERSION="$2}'
pip show torch_npu | grep "Version:" | awk '{print "GHA_TORCH_NPU_VERSION="$2}'
pip show vllm | grep "Version:" | awk '{print "GHA_VLLM_VERSION="$2}' | sed 's/+.*//'
echo "GHA_VLLM_ASCEND_VERSION=${{ github.event.inputs.vllm-ascend-version || github.ref }}"
} >> "$GITHUB_ENV"
- name: Print versions
Expand All @@ -209,7 +211,7 @@ jobs:
echo "Torch NPU: ${{ env.GHA_TORCH_NPU_VERSION }}"
echo "Torch: ${{ env.GHA_TORCH_VERSION }}"
echo "vLLM: ${{ env.GHA_VLLM_VERSION }}"
echo "vLLM Ascend: ${{ env.GHA_VLLM_ASCEND_VERSION || github.ref }}"
echo "vLLM Ascend: ${{ env.GHA_VLLM_ASCEND_VERSION }}"
- name: Run Accuracy Test for V${{ matrix.vllm_use_version }}
id: report
Expand Down Expand Up @@ -238,10 +240,16 @@ jobs:
run: |
cat ./benchmarks/accuracy/${{ steps.report.outputs.markdown_name }}.md >> $GITHUB_STEP_SUMMARY
- name: Sanitize version string for artifact naming
run: |
SAFE_VLLM_ASCEND_VERSION="${GHA_VLLM_ASCEND_VERSION//\//-}"
echo "SAFE_VLLM_ASCEND_VERSION=$SAFE_VLLM_ASCEND_VERSION" >> "$GITHUB_ENV"
- name: Upload Report for V${{ matrix.vllm_use_version }}
if: ${{ github.event_name == 'workflow_dispatch' }}
uses: actions/upload-artifact@v4
with:
name: "${{ env.GHA_VLLM_ASCEND_VERSION }}-${{ steps.report.outputs.markdown_name }}-report"
name: "${{ env.SAFE_VLLM_ASCEND_VERSION }}-${{ steps.report.outputs.markdown_name }}-report"
path: ./benchmarks/accuracy/${{ steps.report.outputs.markdown_name }}.md
if-no-files-found: warn
retention-days: 90
Expand Down
53 changes: 0 additions & 53 deletions .github/workflows/actionlint.yml

This file was deleted.

Loading