24
24
# each worker's `__init__` function.
25
25
#
26
26
# Then in each kind of patch, there are three folders:
27
- # - patch_0_8_4 : contains the patches applied when vllm version is 0.8.4 .
27
+ # - patch_0_8_5 : contains the patches applied when vllm version is 0.8.5 .
28
28
# - patch_main: contains the patches applied when vllm version is main branch.
29
- # - patch_common: contains the patches applied in both 0.8.4 and main branch.
29
+ # - patch_common: contains the patches applied in both 0.8.5 and main branch.
30
30
#
31
31
# In the future, with the vllm version upgrade, the new patch folder such as
32
32
# patch_0_8_5, patch_0_8_6, etc. will be added to manage the patch for different
42
42
# --------------------------------
43
43
# * Platform Patch:
44
44
# =================
45
- # ** File: platform/patch_0_8_4/patch_config.py**
46
- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47
- # 1. `vllm.config.ModelConfig.__init__()`
48
- # Why:
49
- # It is hard coded for sleep mode to support cuda platform only
50
- # How:
51
- # Using a new method to check if sleep mode is available
52
- # Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
53
- # https://github.com/vllm-project/vllm/pull/16562
54
- # Future Plan:
55
- # This patch is only used for 084 and can't be revert. just keep as it is.
56
- #
57
45
# ** File: platform/patch_common/patch_distributed.py**
58
46
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
59
47
# 1. `vllm.distributed.parallel_state.destroy_model_parallel()`
100
88
#
101
89
# * Worker Patch:
102
90
# ===============
103
- # ** File: worker/patch_0_8_4/patch_metrics.py **
104
- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
105
- # 1. `vllm.spec_decode.metrics.AsyncMetricsCollector.init_tensors` and
106
- # `vllm.spec_decode.metrics.AsyncMetricsCollector._copy_rejsample_metrics_async`
107
- # Why:
108
- # There are cuda hard code (torch.cuda.Stream) in `AsyncMetricsCollector.init_tensors` and
109
- # `AsyncMetricsCollector._copy_rejsample_metrics_async`
110
- # How:
111
- # Replace it with the corresponding npu method
112
- # Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
113
- # https://github.com/vllm-project/vllm/pull/14411
114
- # Future Plan:
115
- # Revert it when the related pr is merged in vllm.
116
- #
117
- # ** File: worker/patch_0_8_4/patch_spec_decode_worker.py **
118
- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
119
- # 1. `vllm.spec_decode.spec_decode_worker.SpecDecodeWorker._configure_model_sampler_for_spec_decode`
120
- # Why:
121
- # vLLM `Remove Sampler from Model Code` so vllm-ascend needs a patch to run in v0.8.4.
122
- # How:
123
- # Use vLLM 0.8.4 method tp patch it.
124
- # Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
125
- # - https://github.com/vllm-project/vllm/pull/17084
126
- # - https://github.com/vllm-project/vllm-ascend/pull/636
127
- # Future Plan:
128
- # Follow v0.8.4 version strategy.
129
- #
130
91
# ** File: worker/patch_common/patch_metrics.py **
131
92
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
132
93
# 1. `vllm.spec_decode.metrics.AsyncMetricsCollector.maybe_collect_rejsample_metrics`
197
158
# - https://github.com/vllm-project/vllm-ascend/pull/395
198
159
# Future Plan:
199
160
# Revert it when the related pr is merged in vllm and vllm-ascend.
200
- #
201
- # ** File: worker/patch_0_8_4/patch_tritonplaceholder.py **
202
- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
203
- # 1. `triton` Module
204
- # Why:
205
- # Triton is not supported on npu currently, importing triton will break vllm-ascend
206
- # How:
207
- # ditto
208
- # Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
209
- # TritonPlaceholder is only available in vllm>0.8.4
210
- # Future Plan:
211
- # Revert it when branch main doesn't maintain v0.8.4.
161
+ #
0 commit comments