Skip to content

Commit f835056

Browse files
authored
[CI] upgrade vllm to 0.8.5 (#715)
1. Upgrade vllm to 0.8.5 2. Drop 0.8.4 support 3. Keep doc to 0.8.4rc2 until we release 0.8.5 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
1 parent 95e7aa4 commit f835056

20 files changed

+48
-579
lines changed

.github/workflows/vllm_ascend_test.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ jobs:
4848
max-parallel: 2
4949
matrix:
5050
os: [linux-arm64-npu-1, linux-arm64-npu-4]
51-
vllm_verison: [main, v0.8.4]
51+
vllm_verison: [main, v0.8.5]
5252
concurrency:
5353
group: >
5454
${{

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ RUN pip config set global.index-url ${PIP_INDEX_URL}
3737

3838
# Install vLLM
3939
ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
40-
ARG VLLM_TAG=v0.8.4
40+
ARG VLLM_TAG=v0.8.5
4141
RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /workspace/vllm
4242
# In x86, triton will be installed by vllm. But in Ascend, triton doesn't work correctly. we need to uninstall it.
4343
RUN VLLM_TARGET_DEVICE="empty" python3 -m pip install -v -e /workspace/vllm/ --extra-index https://download.pytorch.org/whl/cpu/ && \

Dockerfile.openEuler

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ COPY . /workspace/vllm-ascend/
3434

3535
# Install vLLM
3636
ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
37-
ARG VLLM_TAG=v0.8.4
37+
ARG VLLM_TAG=v0.8.5
3838

3939
RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /workspace/vllm
4040
# In x86, triton will be installed by vllm. But in Ascend, triton doesn't work correctly. we need to uninstall it.

vllm_ascend/__init__.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,5 @@ def register():
2323

2424

2525
def register_model():
26-
# TODO: fixme when TritonPlaceholder fixed
27-
from vllm_ascend.utils import vllm_version_is
28-
if vllm_version_is("0.8.4"):
29-
import vllm_ascend.patch.worker.patch_0_8_4.patch_tritonplaceholder # noqa
3026
from .models import register_model
3127
register_model()

vllm_ascend/patch/__init__.py

Lines changed: 3 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@
2424
# each worker's `__init__` function.
2525
#
2626
# Then in each kind of patch, there are three folders:
27-
# - patch_0_8_4: contains the patches applied when vllm version is 0.8.4.
27+
# - patch_0_8_5: contains the patches applied when vllm version is 0.8.5.
2828
# - patch_main: contains the patches applied when vllm version is main branch.
29-
# - patch_common: contains the patches applied in both 0.8.4 and main branch.
29+
# - patch_common: contains the patches applied in both 0.8.5 and main branch.
3030
#
3131
# In the future, with the vllm version upgrade, the new patch folder such as
3232
# patch_0_8_5, patch_0_8_6, etc. will be added to manage the patch for different
@@ -42,18 +42,6 @@
4242
# --------------------------------
4343
# * Platform Patch:
4444
# =================
45-
# ** File: platform/patch_0_8_4/patch_config.py**
46-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47-
# 1. `vllm.config.ModelConfig.__init__()`
48-
# Why:
49-
# It is hard coded for sleep mode to support cuda platform only
50-
# How:
51-
# Using a new method to check if sleep mode is available
52-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
53-
# https://github.com/vllm-project/vllm/pull/16562
54-
# Future Plan:
55-
# This patch is only used for 084 and can't be revert. just keep as it is.
56-
#
5745
# ** File: platform/patch_common/patch_distributed.py**
5846
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5947
# 1. `vllm.distributed.parallel_state.destroy_model_parallel()`
@@ -100,33 +88,6 @@
10088
#
10189
# * Worker Patch:
10290
# ===============
103-
# ** File: worker/patch_0_8_4/patch_metrics.py **
104-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
105-
# 1. `vllm.spec_decode.metrics.AsyncMetricsCollector.init_tensors` and
106-
# `vllm.spec_decode.metrics.AsyncMetricsCollector._copy_rejsample_metrics_async`
107-
# Why:
108-
# There are cuda hard code (torch.cuda.Stream) in `AsyncMetricsCollector.init_tensors` and
109-
# `AsyncMetricsCollector._copy_rejsample_metrics_async`
110-
# How:
111-
# Replace it with the corresponding npu method
112-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
113-
# https://github.com/vllm-project/vllm/pull/14411
114-
# Future Plan:
115-
# Revert it when the related pr is merged in vllm.
116-
#
117-
# ** File: worker/patch_0_8_4/patch_spec_decode_worker.py **
118-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
119-
# 1. `vllm.spec_decode.spec_decode_worker.SpecDecodeWorker._configure_model_sampler_for_spec_decode`
120-
# Why:
121-
# vLLM `Remove Sampler from Model Code` so vllm-ascend needs a patch to run in v0.8.4.
122-
# How:
123-
# Use vLLM 0.8.4 method tp patch it.
124-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
125-
# - https://github.com/vllm-project/vllm/pull/17084
126-
# - https://github.com/vllm-project/vllm-ascend/pull/636
127-
# Future Plan:
128-
# Follow v0.8.4 version strategy.
129-
#
13091
# ** File: worker/patch_common/patch_metrics.py **
13192
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13293
# 1. `vllm.spec_decode.metrics.AsyncMetricsCollector.maybe_collect_rejsample_metrics`
@@ -197,15 +158,4 @@
197158
# - https://github.com/vllm-project/vllm-ascend/pull/395
198159
# Future Plan:
199160
# Revert it when the related pr is merged in vllm and vllm-ascend.
200-
#
201-
# ** File: worker/patch_0_8_4/patch_tritonplaceholder.py **
202-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
203-
# 1. `triton` Module
204-
# Why:
205-
# Triton is not supported on npu currently, importing triton will break vllm-ascend
206-
# How:
207-
# ditto
208-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
209-
# TritonPlaceholder is only available in vllm>0.8.4
210-
# Future Plan:
211-
# Revert it when branch main doesn't maintain v0.8.4.
161+
#

vllm_ascend/patch/platform/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717
from vllm_ascend.utils import vllm_version_is
1818

1919
# Import specific patches for different versions
20-
if vllm_version_is("0.8.4"):
21-
from vllm_ascend.patch.platform import patch_0_8_4 # noqa: F401
20+
if vllm_version_is("0.8.5"):
21+
from vllm_ascend.patch.platform import patch_0_8_5 # noqa: F401
2222
from vllm_ascend.patch.platform import patch_common # noqa: F401
2323
else:
2424
from vllm_ascend.patch.platform import patch_common # noqa: F401

vllm_ascend/patch/platform/patch_0_8_4/patch_config.py

Lines changed: 0 additions & 243 deletions
This file was deleted.

vllm_ascend/patch/platform/patch_0_8_4/__init__.py renamed to vllm_ascend/patch/platform/patch_0_8_5/__init__.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,3 @@
1414
# See the License for the specific language governing permissions and
1515
# limitations under the License.
1616
#
17-
18-
import vllm_ascend.patch.platform.patch_0_8_4.patch_config # noqa

vllm_ascend/patch/worker/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@
1818
from vllm_ascend.utils import vllm_version_is
1919

2020
# Import specific patches for different versions
21-
if vllm_version_is("0.8.4"):
22-
from vllm_ascend.patch.worker import patch_0_8_4 # noqa: F401
21+
if vllm_version_is("0.8.5"):
22+
from vllm_ascend.patch.worker import patch_0_8_5 # noqa: F401
2323
from vllm_ascend.patch.worker import patch_common # noqa: F401
2424
else:
2525
from vllm_ascend.patch.worker import patch_common # noqa: F401

0 commit comments

Comments
 (0)