Skip to content

Commit 538a69c

Browse files
authored
[Patch] format patch module to make it more clear (#601)
Format patch module to make it more clear. Add the patch doc description, the new patch must follow this guide. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
1 parent ad845bf commit 538a69c

File tree

7 files changed

+136
-141
lines changed

7 files changed

+136
-141
lines changed

vllm_ascend/patch/__init__.py

Lines changed: 133 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,136 @@
1313
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1414
# See the License for the specific language governing permissions and
1515
# limitations under the License.
16-
#
16+
17+
# ----------------------------------------------------------------------------------
18+
# This module manage the patch for vllm. There are two folders in this module:
19+
# - platform: contains the patches applied before worker starts. It's called by
20+
# `vllm_ascend.utils.adapt_patch(is_global_patch=True)` in
21+
# `vllm_ascend.platform.NPUPlatform.pre_register_and_update()` function.
22+
# - worker: contains the patches applied when worker starts. It's called by
23+
# `vllm_ascend.utils.adapt_patch(is_global_patch=False)` in
24+
# each worker's `__init__` function.
25+
#
26+
# Then in each kind of patch, there are three folders:
27+
# - patch_0_8_4: contains the patches applied when vllm version is 0.8.4.
28+
# - patch_main: contains the patches applied when vllm version is main branch.
29+
# - patch_common: contains the patches applied in both 0.8.4 and main branch.
30+
#
31+
# In the future, with the vllm version upgrade, the new patch folder such as
32+
# patch_0_8_5, patch_0_8_6, etc. will be added to manage the patch for different
33+
# vllm version. And the patch_common will contain the patches applied in all the
34+
# vllm version.
35+
# Once the vllm version is too old that vllm-ascend will not support, the related
36+
# patch folder will be removed as well.
37+
#
38+
# Once a new patch is added in vllm-ascend, please add the patch description into this file as well.
39+
# ----------------------------------------------------------------------------------
40+
41+
# What's Patched and how it works:
42+
# --------------------------------
43+
# * Platform Patch:
44+
# =================
45+
# ** File: platform/patch_0_8_4/patch_config.py**
46+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47+
# 1. `vllm.config.ModelConfig.__init__()`
48+
# Why:
49+
# It is hard coded for sleep mode to support cuda platform only
50+
# How:
51+
# Using a new method to check if sleep mode is available
52+
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
53+
# https://github.com/vllm-project/vllm/pull/16562
54+
# Future Plan:
55+
# This patch is only used for 084 and can't be revert. just keep as it is.
56+
#
57+
# ** File: platform/patch_common/patch_distributed.py**
58+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
59+
# 1. `vllm.distributed.parallel_state.destroy_model_parallel()`
60+
# Why:
61+
# vllm dose not support outside platform maintain its own `CoordinatorGroup`, vllm-ascend maintain EP and ETP
62+
# inside of the repo, and needs a common interface to destroy them, this patch add the interface of destroy
63+
# platform owned `CoordinatorGroup` to make sure all the CoordinateGroup can be properly destroyed
64+
# How:
65+
# Call platform method `destroy_platform_model_parallel` to destroy all the `CoordinateGroup`
66+
# Related PR (if no, explain why): no related PR, we want add this ability into vllm
67+
# Future Plan:
68+
# Remove those patch when vllm merged them
69+
# 2. `vllm.distributed.stateless_init_torch_distributed_process_group()`
70+
# Why:
71+
# The stateless process group can not be initialized except from gloo and nccl backend, vllm-ascend
72+
# needs to initialize its own stateless process group for communication, so we add the platform related
73+
# call to the `stateless_init_torch_distributed_process_group`, to enable other platform which may support
74+
# stateless process group initialize method
75+
# How:
76+
# Call platform method `platform_has_backend_register` to judge if there is a stateless process group initialize
77+
# method and call platform method `platform_register_backend` to initialize them
78+
# Related PR (if no, explain why): no related PR, we want add this ability into vllm
79+
# Future Plan:
80+
# Remove those patch when vllm merged them
81+
# 3. `ParallelConfig.get_next_dp_init_port`
82+
# Why:
83+
# We want to get dp port from env variable, so the multi-node inference can be properly initialized and run.
84+
# How:
85+
# Get the dp port from env variable enable multi-mode dp inference
86+
# Related PR (if no, explain why): no related PR, we want add this ability into vllm
87+
# Future Plan:
88+
# Its a workaround in vllm-ascend to enable multi-node dp inference, maybe removed if vllm have better plan
89+
# on multi-node dp inference implementation
90+
#
91+
# * Worker Patch:
92+
# ===============
93+
# ** File: worker/patch_common/patch_metrics.py **
94+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
95+
# 1. `vllm.spec_decode.metrics.AsyncMetricsCollector.init_tensors` and
96+
# `vllm.spec_decode.metrics.AsyncMetricsCollector._copy_rejsample_metrics_async`
97+
# Why:
98+
# There are cuda hard code (torch.cuda.Stream) in `AsyncMetricsCollector.init_tensors` and
99+
# `AsyncMetricsCollector._copy_rejsample_metrics_async`
100+
# How:
101+
# Replace it with the corresponding npu method
102+
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
103+
# https://github.com/vllm-project/vllm/pull/14411
104+
# Future Plan:
105+
# Revert it when the related pr is merged in vllm.
106+
#
107+
# 2. `vllm.spec_decode.metrics.AsyncMetricsCollector.maybe_collect_rejsample_metrics`
108+
# Why:
109+
# There are cuda hard code (current_platform.is_cuda_alike()) in
110+
# `AsyncMetricsCollector.maybe_collect_rejsample_metrics`
111+
# How:
112+
# Change to use `current_platform.Event` to determine whether to return None
113+
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
114+
# https://github.com/vllm-project/vllm/pull/14411
115+
# Future Plan:
116+
# Revert it when the related pr is merged in vllm.
117+
#
118+
# ** File: worker/patch_common/patch_multi_step_worker.py **
119+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120+
# 1. `vllm.spec_decode.multi_step_worker.MultiStepWorker.sampler_output`
121+
# Why:
122+
# There are cuda hard code (current_platform.is_cuda_alike()) in
123+
# `MultiStepWorker.sampler_output`, and we need to use the patched `TP1DraftModelRunner` in it.
124+
# How:
125+
# Make speculative decoding extensible to different backends.
126+
# - support attention metadata register to the set supported spec decode
127+
# - offer a api in platform to determine whether spec decode is supported,
128+
# and deprecate is_cuda_alike in it.
129+
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
130+
# - https://github.com/vllm-project/vllm/pull/15195
131+
# - https://github.com/vllm-project/vllm-ascend/pull/395
132+
# Future Plan:
133+
# Revert it when the related pr is merged in vllm and vllm-ascend.
134+
#
135+
# ** File: worker/patch_common/patch_multi_step_worker.py **
136+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
137+
# 1. `vllm.spec_decode.spec_decode_worker.SpecDecodeWorker.create_worker`
138+
# Why:
139+
# We need to use the patched `TP1DraftModelRunner` in `SpecDecodeWorker.create_worker`.
140+
# The mainly reason to overwrite `TP1DraftModelRunner`is the hard code of
141+
# `FlashAttentionMetadata`
142+
# How:
143+
# ditto
144+
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
145+
# - https://github.com/vllm-project/vllm/pull/15195
146+
# - https://github.com/vllm-project/vllm-ascend/pull/395
147+
# Future Plan:
148+
# Revert it when the related pr is merged in vllm and vllm-ascend.

vllm_ascend/patch/platform/patch_0_8_4/__init__.py

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,5 @@
1414
# See the License for the specific language governing permissions and
1515
# limitations under the License.
1616
#
17-
# What's Patched and how it works:
18-
# ** File: platform/patch_0_8_4/patch_config.py**
19-
# 1. `vllm.config.ModelConfig.__init__()`
20-
# Why:
21-
# It is hard coded for sleep mode to support cuda platform only
22-
# How:
23-
# Using a new method to check if sleep mode is available
24-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
25-
# https://github.com/vllm-project/vllm/pull/16562
26-
# Future Plan:
27-
# This patch is only used for 084 and can't be revert. just keep as it is.
2817

2918
import vllm_ascend.patch.platform.patch_0_8_4.patch_config # noqa
30-
import vllm_ascend.patch.platform.patch_0_8_4.patch_distributed # noqa

vllm_ascend/patch/platform/patch_common/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,6 @@
1313
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1414
# See the License for the specific language governing permissions and
1515
# limitations under the License.
16-
#
16+
#
17+
18+
import vllm_ascend.patch.platform.patch_common.patch_distributed # noqa

vllm_ascend/patch/platform/patch_0_8_4/patch_distributed.py renamed to vllm_ascend/patch/platform/patch_common/patch_distributed.py

Lines changed: 0 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -27,40 +27,6 @@
2727
from torch.distributed.rendezvous import rendezvous
2828
from vllm.config import ParallelConfig
2929

30-
# What's Patched and how it works:
31-
# ** File: platform/patch_0_8_4/patch_distributed.py**
32-
# 1. `vllm.distributed.parallel_state.destroy_model_parallel()`
33-
# Why:
34-
# vllm dose not support outside platform maintain its own `CoordinatorGroup`, vllm-ascend maintain EP and ETP
35-
# inside of the repo, and needs a common interface to destroy them, this patch add the interface of destroy
36-
# platform owned `CoordinatorGroup` to make sure all the CoordinateGroup can be properly destroyed
37-
# How:
38-
# Call platform method `destroy_platform_model_parallel` to destroy all the `CoordinateGroup`
39-
# Related PR (if no, explain why): no related PR, we want add this ability into vllm
40-
# Future Plan:
41-
# Remove those patch when vllm merged them
42-
# 2. `vllm.distributed.stateless_init_torch_distributed_process_group()`
43-
# Why:
44-
# The stateless process group can not be initialized except from gloo and nccl backend, vllm-ascend
45-
# needs to initialize its own stateless process group for communication, so we add the platform related
46-
# call to the `stateless_init_torch_distributed_process_group`, to enable other platform which may support
47-
# stateless process group initialize method
48-
# How:
49-
# Call platform method `platform_has_backend_register` to judge if there is a stateless process group initialize
50-
# method and call platform method `platform_register_backend` to initialize them
51-
# Related PR (if no, explain why): no related PR, we want add this ability into vllm
52-
# Future Plan:
53-
# Remove those patch when vllm merged them
54-
# 3. `ParallelConfig.get_next_dp_init_port`
55-
# Why:
56-
# We want to get dp port from env variable, so the multi-node inference can be properly initialized and run.
57-
# How:
58-
# Get the dp port from env variable enable multi-mode dp inference
59-
# Related PR (if no, explain why): no related PR, we want add this ability into vllm
60-
# Future Plan:
61-
# Its a workaround in vllm-ascend to enable multi-node dp inference, maybe removed if vllm have better plan
62-
# on multi-node dp inference implementation
63-
6430

6531
def ascend_destroy_model_parallel():
6632
"""Set the groups to none and destroy them."""

vllm_ascend/patch/platform/patch_main/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,3 @@
1414
# See the License for the specific language governing permissions and
1515
# limitations under the License.
1616
#
17-
import vllm_ascend.patch.platform.patch_main.patch_distributed # noqa F401

vllm_ascend/patch/platform/patch_main/patch_distributed.py

Lines changed: 0 additions & 32 deletions
This file was deleted.

vllm_ascend/patch/worker/patch_common/__init__.py

Lines changed: 0 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -15,66 +15,6 @@
1515
# limitations under the License.
1616
#
1717

18-
# What's Patched and how it works:
19-
# ** File: worker/patch_common/patch_metrics.py **
20-
# 1. `vllm.spec_decode.metrics.AsyncMetricsCollector.init_tensors` and
21-
# `vllm.spec_decode.metrics.AsyncMetricsCollector._copy_rejsample_metrics_async`
22-
# Why:
23-
# There are cuda hard code (torch.cuda.Stream) in `AsyncMetricsCollector.init_tensors` and
24-
# `AsyncMetricsCollector._copy_rejsample_metrics_async`
25-
# How:
26-
# Replace it with the corresponding npu method
27-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
28-
# https://github.com/vllm-project/vllm/pull/14411
29-
# Future Plan:
30-
# Revert it when the related pr is merged in vllm.
31-
#
32-
# 2. `vllm.spec_decode.metrics.AsyncMetricsCollector.maybe_collect_rejsample_metrics`
33-
# Why:
34-
# There are cuda hard code (current_platform.is_cuda_alike()) in
35-
# `AsyncMetricsCollector.maybe_collect_rejsample_metrics`
36-
# How:
37-
# Change to use `current_platform.Event` to determine whether to return None
38-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
39-
# https://github.com/vllm-project/vllm/pull/14411
40-
# Future Plan:
41-
# Revert it when the related pr is merged in vllm.
42-
#
43-
# ** File: worker/patch_common/patch_multi_step_worker.py **
44-
# 1. `vllm.spec_decode.multi_step_worker.MultiStepWorker.sampler_output`
45-
# Why:
46-
# There are cuda hard code (current_platform.is_cuda_alike()) in
47-
# `MultiStepWorker.sampler_output`, and we need to use the patched `TP1DraftModelRunner` in it.
48-
# How:
49-
# Make speculative decoding extensible to different backends.
50-
# - support attention metadata register to the set supported spec decode
51-
# - offer a api in platform to determine whether spec decode is supported,
52-
# and deprecate is_cuda_alike in it.
53-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
54-
# - https://github.com/vllm-project/vllm/pull/15195
55-
# - https://github.com/vllm-project/vllm-ascend/pull/395
56-
# Future Plan:
57-
# Revert it when the related pr is merged in vllm and vllm-ascend.
58-
#
59-
# ** File: worker/patch_common/patch_multi_step_worker.py **
60-
# 1. `vllm.spec_decode.spec_decode_worker.SpecDecodeWorker.create_worker`
61-
# Why:
62-
# We need to use the patched `TP1DraftModelRunner` in `SpecDecodeWorker.create_worker`.
63-
# The mainly reason to overwrite `TP1DraftModelRunner`is the hard code of
64-
# `FlashAttentionMetadata`
65-
# How:
66-
# ditto
67-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
68-
# - https://github.com/vllm-project/vllm/pull/15195
69-
# - https://github.com/vllm-project/vllm-ascend/pull/395
70-
# Future Plan:
71-
# Revert it when the related pr is merged in vllm and vllm-ascend.
72-
73-
# current_platform.is_cuda_alike()
74-
# 0.8.4 patch doc:
75-
# platform-0.8.4 + platform-common + worker-0.8.4 + worker-common
76-
# ...
77-
7818
import vllm_ascend.patch.worker.patch_common.patch_metrics # noqa
7919
import vllm_ascend.patch.worker.patch_common.patch_multi_step_worker # noqa
8020
import vllm_ascend.patch.worker.patch_common.patch_spec_decode_worker # noqa

0 commit comments

Comments
 (0)