Skip to content

[cherry-pick][0.9.1] rebase main #1250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jun 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
07411f9
[CI] remove old quantization model (#1003)
22dimensions Jun 10, 2025
9181e92
Update 0.9.0rc1 contributors info (#1148)
Yikun Jun 10, 2025
3e55d9e
[CI] Make accuarcy CI and report work (#1078)
zhangxinyuehfad Jun 10, 2025
d1095bc
[Bugfix] add compilation/__init__.py to fix import error (#1152)
wangxiyuan Jun 10, 2025
7eb9f23
[CI] Run e2e after pre check pass (#1132)
wangxiyuan Jun 10, 2025
9861dc5
[MLA][Graph] Improve assertion on Graph mode with MLA (#933)
MengqingCao Jun 10, 2025
cf419aa
[CI] rename Qwen2.5-0.5B-Instruct-W8A8 model (#1145)
22dimensions Jun 10, 2025
b97e79c
[CI] Skip test_v1_spec_decode.py::test_ngram_correctness to make long…
MengqingCao Jun 10, 2025
87a74bb
Support multistream of shared experts in FusedMoE (#997)
sdmyzlp Jun 11, 2025
5f89652
provide an e2e guide for execute duration profiling (#1113)
depeng1994 Jun 11, 2025
8326314
etp best a2 (#1101)
ttanzhiqiang Jun 11, 2025
933e261
[Doc] Fix the config parameter name "enable" in graph_mode.md. (#1159)
yzim Jun 11, 2025
6ba3c10
Enable kvcache_nz for the decode process in torchair graph mode (#1098)
chenwaner Jun 11, 2025
b686540
[CI] Upgrade vllm to 0.9.1 (#1165)
wangxiyuan Jun 11, 2025
a265a4f
[Scheduler][MTP] Add support for speculative decoding in AsecendSched…
whx-sjtu Jun 11, 2025
fcd5ad8
add custom ascendc kernel vocabparallelembedding (#796)
ttanzhiqiang Jun 12, 2025
9bea014
[CI][Benchmark] Add new model and v1 test to perf benchmarks (#1099)
Potabk Jun 12, 2025
d6aacdf
[CI][Benchmark] Add qwen2.5-7b test (#1104)
Potabk Jun 12, 2025
e0ef036
[fix] fix bug in 1p1d disaggregated_prefill example (#1184)
wangyanhui-cmss Jun 12, 2025
f0ee1c3
[Doc] Add Referer header for CANN package download url. (#1192)
wonderful199082 Jun 12, 2025
634325c
Support multistream of MLA vector operations (#1135)
sdmyzlp Jun 12, 2025
cd55d0c
[CI] Recover ut for ascend scheduler only in ci of v1. (#1180)
whx-sjtu Jun 12, 2025
a4294bb
Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer (#1203)
Yikun Jun 13, 2025
ca0ab36
[CI/UT][Graph] Add ut for torchair graph mode (#1103)
MengqingCao Jun 14, 2025
fa22e88
[Doc] fix VLLM_USE_V1 value in graph mode docs (#1226)
22dimensions Jun 15, 2025
45e802c
Fix the device error when using ray as vllm-acend backend (#884)
zhuo97 Jun 16, 2025
94d0f07
remove main vll verison.
Jun 17, 2025
f9491b8
Revert "Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer (#1203)"
Jun 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 149 additions & 75 deletions .github/workflows/accuracy_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,110 +19,184 @@ name: Accuracy Report
on:
workflow_dispatch:
inputs:
branch:
description: 'choose a dev branch to pr'
vllm-ascend-branch:
description: 'vllm-ascend branch:'
required: true
vllm-ascend-version:
description: 'what vllm-ascend version to accuracy test?'
type: choice
options:
- main
- v0.7.3-dev
models:
description: 'models:'
required: true
type: string
type: choice
options:
- all
- Qwen/Qwen2.5-7B-Instruct
- Qwen/Qwen2.5-VL-7B-Instruct
- Qwen/Qwen3-8B-Base
default: 'all'

jobs:
download:
download_reports:
runs-on: ubuntu-latest
strategy:
matrix:
model: ${{ fromJSON(
(github.event.inputs.models == 'all' &&
'["Qwen/Qwen2.5-7B-Instruct","Qwen/Qwen2.5-VL-7B-Instruct","Qwen/Qwen3-8B-Base"]') ||
(github.event.inputs.models == 'Qwen/Qwen2.5-7B-Instruct' &&
'["Qwen/Qwen2.5-7B-Instruct"]') ||
(github.event.inputs.models == 'Qwen/Qwen2.5-VL-7B-Instruct' &&
'["Qwen/Qwen2.5-VL-7B-Instruct"]') ||
(github.event.inputs.models == 'Qwen/Qwen3-8B-Base' &&
'["Qwen/Qwen3-8B-Base"]')
) }}

version: [0, 1]
exclude:
- model: 'Qwen/Qwen2.5-VL-7B-Instruct'
version: 1
fail-fast: false

name: Download ${{ matrix.model }} V${{ matrix.version }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}

- name: Debug List Artifacts
run: gh api /repos/${{ github.repository }}/actions/artifacts
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ref: ${{ github.event.inputs.vllm-ascend-branch }}

- name: Query artifact run id for Qwen2.5-VL-7B-Instruct V0 latest artifact
id: get_Qwen2_5_VL_7B_Instruct_latest_run_id_V0
- name: Get base model name
id: get_basename
run: |
ARTIFACT_JSON=$(gh api "repos/${{ github.repository }}/actions/artifacts")
RUN_ID=$(echo "$ARTIFACT_JSON" | \
jq -r '[.artifacts[] | select(.name=="${{ github.event.inputs.vllm-ascend-version }}-Qwen2.5-VL-7B-Instruct-V0-report")] | sort_by(.created_at) | last | .workflow_run.id')
echo "runid=$RUN_ID" >> "$GITHUB_OUTPUT"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
model_base_name=$(basename "${{ matrix.model }}")
echo "model_base_name=$model_base_name" >> $GITHUB_OUTPUT
shell: bash

- name: Query artifact run id for Qwen2.5-7B-Instruct V0 latest artifact
id: get_Qwen2_5_7B_Instruct_latest_run_id_V0
- name: Query artifact run id
id: get_run_id
run: |
ARTIFACT_JSON=$(gh api "repos/${{ github.repository }}/actions/artifacts")
ARTIFACT_PATTERN="${{ github.event.inputs.vllm-ascend-branch }}-${{ steps.get_basename.outputs.model_base_name }}-V${{ matrix.version }}-report"
echo "Querying artifacts with pattern: $ARTIFACT_PATTERN"

ARTIFACT_JSON=$(gh api --paginate /repos/${{ github.repository }}/actions/artifacts || echo "{}")

RUN_ID=$(echo "$ARTIFACT_JSON" | \
jq -r '[.artifacts[] | select(.name=="${{ github.event.inputs.vllm-ascend-version }}-Qwen2.5-7B-Instruct-V0-report")] | sort_by(.created_at) | last | .workflow_run.id')
echo "runid=$RUN_ID" >> "$GITHUB_OUTPUT"
jq -s -r --arg pattern "$ARTIFACT_PATTERN" \
'[.[].artifacts[]] | map(select(.name | test($pattern))) | sort_by(.created_at) | last | .workflow_run.id // empty')

if [ -z "$RUN_ID" ]; then
echo "::warning::No artifact found matching pattern $ARTIFACT_PATTERN. Skipping download."
echo "runid=" >> $GITHUB_OUTPUT
else
echo "Found matching artifact with run ID: $RUN_ID"
echo "runid=$RUN_ID" >> $GITHUB_OUTPUT
fi
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Query artifact run id for Qwen3-8B-Base V0 latest artifact
id: get_Qwen3_8B_Base_latest_run_id_V0
run: |
ARTIFACT_JSON=$(gh api "repos/${{ github.repository }}/actions/artifacts")
RUN_ID=$(echo "$ARTIFACT_JSON" | \
jq -r '[.artifacts[] | select(.name=="${{ github.event.inputs.vllm-ascend-version }}-Qwen3-8B-Base-V0-report")] | sort_by(.created_at) | last | .workflow_run.id')
echo "runid=$RUN_ID" >> "$GITHUB_OUTPUT"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Download Qwen/Qwen2.5-VL-7B-Instruct V0 Artifact
- name: Download Artifact
if: ${{ steps.get_run_id.outputs.runid != '' }}
uses: actions/download-artifact@v4
with:
name: ${{ github.event.inputs.vllm-ascend-version }}-Qwen2.5-VL-7B-Instruct-V0-report
path: ./docs/source/developer_guide/evaluation/accuracy_report
github-token: ${{ secrets.GITHUB_TOKEN }}
repository: vllm-project/vllm-ascend
run-id: ${{ steps.get_Qwen2_5_VL_7B_Instruct_latest_run_id_V0.outputs.runid }}
name: ${{ github.event.inputs.vllm-ascend-branch }}-${{ steps.get_basename.outputs.model_base_name }}-V${{ matrix.version }}-report
path: ./docs/source/developer_guide/evaluation/accuracy_report_bak
github-token: ${{ secrets.GITHUB_TOKEN }}
repository: ${{ github.repository }}
run-id: ${{ steps.get_run_id.outputs.runid }}

- name: Upload reports artifact
if: ${{ steps.get_run_id.outputs.runid != '' }}
uses: actions/upload-artifact@v4
with:
name: report-${{ steps.get_basename.outputs.model_base_name }}-v${{ matrix.version }}
path: ./docs/source/developer_guide/evaluation/accuracy_report_bak/*.md
retention-days: 90

- name: Download Qwen/Qwen2.5-7B-Instruct Artifact
uses: actions/download-artifact@v4
create_pr:
runs-on: ubuntu-latest
needs: download_reports
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
name: ${{ github.event.inputs.vllm-ascend-version }}-Qwen2.5-7B-Instruct-V0-report
path: ./docs/source/developer_guide/evaluation/accuracy_report
github-token: ${{ secrets.GITHUB_TOKEN }}
repository: vllm-project/vllm-ascend
run-id: ${{ steps.get_Qwen2_5_7B_Instruct_latest_run_id_V0.outputs.runid }}
ref: ${{ github.event.inputs.vllm-ascend-branch }}

- name: Setup workspace
run: mkdir -p ./accuracy/accuracy_report

- name: Download Qwen/Qwen3-8B-Base Artifact
- name: Download only current run reports
uses: actions/download-artifact@v4
with:
name: ${{ github.event.inputs.vllm-ascend-version }}-Qwen3-8B-Base-V0-report
path: ./docs/source/developer_guide/evaluation/accuracy_report
pattern: report-*
github-token: ${{ secrets.GITHUB_TOKEN }}
repository: vllm-project/vllm-ascend
run-id: ${{ steps.get_Qwen3_8B_Base_latest_run_id_V0.outputs.runid }}
run-id: ${{ github.run_id }}

- name: Delete old report
run: |
find ./docs/source/developer_guide/evaluation/accuracy_report -maxdepth 1 -type f -name '*.md' ! -name 'index.md' -delete
find ./docs/source/developer_guide/evaluation/accuracy_report -mindepth 2 -type f -name '*.md' -exec mv -f {} ./docs/source/developer_guide/evaluation/accuracy_report \;
find ./docs/source/developer_guide/evaluation/accuracy_report -mindepth 1 -type d -empty -delete

- name: Display Files
working-directory: ./docs/source/developer_guide/evaluation/accuracy_report
- name: Generate step summary
if: ${{ always() }}
run: |
cat ./Qwen2.5-VL-7B-Instruct.md
cat ./Qwen2.5-7B-Instruct.md
cat ./Qwen3-8B-Base.md

- name: Create Pull Request for markdown update
for report in ./docs/source/developer_guide/evaluation/accuracy_report/*.md; do
filename=$(basename "$report")
# skip index.md
if [ "$filename" = "index.md" ]; then
continue
fi

if [ -f "$report" ]; then
{
echo -e "\n\n---\n"
echo "## 📄 Report File: $(basename $report)"
cat "$report"
} >> "$GITHUB_STEP_SUMMARY"
fi
done

- name: Update accuracy_report/index.md
run: |
REPORT_DIR="./docs/source/developer_guide/evaluation/accuracy_report"
INDEX_MD="$REPORT_DIR/index.md"

{
echo "# Accuracy Report"
echo ""
echo "::: {toctree}"
echo ":caption: Accuracy Report"
echo ":maxdepth: 1"

for report in "$REPORT_DIR"/*.md; do
filename="$(basename "$report" .md)"
if [ "$filename" != "index" ]; then
echo "$filename"
fi
done

echo ":::"
} > "$INDEX_MD"

- name: Create Pull Request
uses: peter-evans/create-pull-request@v7
with:
token: ${{ secrets.PR_TOKEN }}
base: ${{ github.event.inputs.branch }}
branch: auto-pr/accuracy-test
commit-message: "Update accuracy report for ${{ github.event.inputs.branch }}"
base: ${{ github.event.inputs.vllm-ascend-branch }}
branch: auto-pr/accuracy-report
commit-message: "Update accuracy reports for ${{ github.event.inputs.vllm-ascend-branch }}"
add-paths: ./docs/source/developer_guide/evaluation/accuracy_report/*.md
title: "[Doc]Update accuracy report for ${{ github.event.inputs.branch }}"
title: "[Doc] Update accuracy reports for ${{ github.event.inputs.vllm-ascend-branch }}"
body: |
The accuracy results running on Ascend NPU have changed, I'm updating the report.
Please review the changes.

The accuracy results running on NPU Altlas A2 have changed, updating reports for:
${{
github.event.inputs.models == 'all'
&& 'All models (Qwen2.5-7B-Instruct, Qwen2.5-VL-7B-Instruct, Qwen3-8B-Base)'
|| github.event.inputs.models
}}

- [Workflow run][1]
- [Qwen2.5-7B-Instruct accuracy report][2]
- [Qwen2.5-VL-7B-Instruct accuracy report][3]
- [Qwen3-8B-Base accuracy report][4]

[1]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
[2]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ steps.get_Qwen2_5_7B_Instruct_latest_run_id_V0.outputs.runid }}
[3]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ steps.get_Qwen2_5_VL_7B_Instruct_latest_run_id_V0.outputs.runid }}
[4]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ steps.get_Qwen3_8B_Base_latest_run_id_V0.outputs.runid }}

[1]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
18 changes: 12 additions & 6 deletions .github/workflows/accuracy_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,7 @@ on:
# Current supported vLLM versions
options:
- main
- v0.9.0.1
- v0.9.0
- v0.9.1
- v0.7.3
vllm-ascend-version:
description: 'vllm-ascend version:'
Expand Down Expand Up @@ -96,7 +95,7 @@ jobs:
# - vl-accuracy-test: Qwen/Qwen2.5-VL-7B-Instruct
model_name: ${{ fromJSON(
(github.event.inputs.models == 'all' &&
'["Qwen/Qwen2.5-7B-Instruct","Qwen/Qwen2.5-VL-7B-Instruct","model_name":"Qwen/Qwen3-8B-Base"]') ||
'["Qwen/Qwen2.5-7B-Instruct","Qwen/Qwen2.5-VL-7B-Instruct","Qwen/Qwen3-8B-Base"]') ||
(github.event.inputs.models == 'Qwen/Qwen2.5-7B-Instruct' &&
'["Qwen/Qwen2.5-7B-Instruct"]') ||
(github.event.inputs.models == 'Qwen/Qwen2.5-VL-7B-Instruct' &&
Expand Down Expand Up @@ -159,7 +158,7 @@ jobs:
repository: vllm-project/vllm
path: ./vllm-empty
# Please also update this when bump matched version
ref: ${{ github.event.inputs.vllm-version || 'v0.9.0' }}
ref: ${{ github.event.inputs.vllm-version || 'v0.9.1' }}

- name: Install vllm-project/vllm from source
working-directory: ./vllm-empty
Expand Down Expand Up @@ -201,6 +200,7 @@ jobs:
pip show torch | grep "Version:" | awk '{print "GHA_TORCH_VERSION="$2}'
pip show torch_npu | grep "Version:" | awk '{print "GHA_TORCH_NPU_VERSION="$2}'
pip show vllm | grep "Version:" | awk '{print "GHA_VLLM_VERSION="$2}' | sed 's/+.*//'
echo "GHA_VLLM_ASCEND_VERSION=${{ github.event.inputs.vllm-ascend-version || github.ref }}"
} >> "$GITHUB_ENV"

- name: Print versions
Expand All @@ -209,7 +209,7 @@ jobs:
echo "Torch NPU: ${{ env.GHA_TORCH_NPU_VERSION }}"
echo "Torch: ${{ env.GHA_TORCH_VERSION }}"
echo "vLLM: ${{ env.GHA_VLLM_VERSION }}"
echo "vLLM Ascend: ${{ env.GHA_VLLM_ASCEND_VERSION || github.ref }}"
echo "vLLM Ascend: ${{ env.GHA_VLLM_ASCEND_VERSION }}"

- name: Run Accuracy Test for V${{ matrix.vllm_use_version }}
id: report
Expand Down Expand Up @@ -238,10 +238,16 @@ jobs:
run: |
cat ./benchmarks/accuracy/${{ steps.report.outputs.markdown_name }}.md >> $GITHUB_STEP_SUMMARY

- name: Sanitize version string for artifact naming
run: |
SAFE_VLLM_ASCEND_VERSION="${GHA_VLLM_ASCEND_VERSION//\//-}"
echo "SAFE_VLLM_ASCEND_VERSION=$SAFE_VLLM_ASCEND_VERSION" >> "$GITHUB_ENV"

- name: Upload Report for V${{ matrix.vllm_use_version }}
if: ${{ github.event_name == 'workflow_dispatch' }}
uses: actions/upload-artifact@v4
with:
name: "${{ env.GHA_VLLM_ASCEND_VERSION }}-${{ steps.report.outputs.markdown_name }}-report"
name: "${{ env.SAFE_VLLM_ASCEND_VERSION }}-${{ steps.report.outputs.markdown_name }}-report"
path: ./benchmarks/accuracy/${{ steps.report.outputs.markdown_name }}.md
if-no-files-found: warn
retention-days: 90
Expand Down
53 changes: 0 additions & 53 deletions .github/workflows/actionlint.yml

This file was deleted.

Loading