From f5b73f02ea05d8cfdb090d1963dac55e4446697e Mon Sep 17 00:00:00 2001 From: MengqingCao Date: Mon, 24 Mar 2025 02:02:31 +0000 Subject: [PATCH 1/7] Add hardware plugin blog post Signed-off-by: MengqingCao --- _posts/2025-03-12-hardware-plugin.md | 115 +++++++++++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 _posts/2025-03-12-hardware-plugin.md diff --git a/_posts/2025-03-12-hardware-plugin.md b/_posts/2025-03-12-hardware-plugin.md new file mode 100644 index 0000000..c0f8af3 --- /dev/null +++ b/_posts/2025-03-12-hardware-plugin.md @@ -0,0 +1,115 @@ +--- +layout: post +title: "Introducing vLLM Hardware Plugin and Best Practice with Ascend NPU" +author: "vLLM Ascend Team" +image: /assets/logos/vllm-logo-only-light.png +--- + +Since December 2024, through the joint efforts of the vLLM community and the vLLM Ascend team, we have completed the [Hardware Pluggable RFC]((https://github.com/vllm-project/vllm/issues/11162)). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms. The RFC has now taken initial shape. +This proposal enables hardware integration into vLLM in a decoupled way, allowing for quick and modular support of various hardware platforms. + +--- + +## Why vLLM Hardware Plugin? + +Currently, vLLM already supports multiple backends. However, as the number of vLLM backends continues to grow, several challenges have emerged: + +- **Increased Code Complexity**: Each hardware backend has its own `Executor`, `Worker`, `Runner`, and `Attention` components. This has increased the complexity of the vLLM codebase, with non-generic backend-specific code scattered throughout the project. +- **High Maintenance Costs**: The cost of maintaining backends is high, not only for the backend developers but also for the vLLM community. The scarcity of community contributor resources makes efficiently adding new features difficult when backend maintainers are not present. +- **Lack of Extensibility**: While vLLM follows a well-structured layered design by implementing backends through `Executor`, `Worker`, `Runner`, and `Attention`, supporting new hardware often requires invasive modifications or patching rather than dynamic registration. This makes adding new backends cumbersome. + +Recognizing the need for a flexible and modular approach to integrating hardware backends, we identified hardware pluginization as a feasible solution: + +- **Decoupled Codebase**: The hardware backend plugin code remains independent, making the vLLM core code cleaner. +- **Reduced Maintenance Burden**: vLLM developers can focus on generic features without being overwhelmed by the differences caused by backend-specific implementations. +- **Faster Integration & More Independent**: New backends can be integrated quickly with less work to do and evolve independently. + +--- + +## What is the vLLM Hardware Plugin? + +Before introducing the vLLM Hardware Plugin, let's first look at two prerequisite RFCs: + +- [[RFC] vLLM Plugin System](https://github.com/vllm-project/vllm/issues/7131): This RFC introduces a plugin-based approach to support various customization requirements, allowing users to define custom models, executors, schedulers, etc. +- [[RFC] Make vLLM Device-Agnostic for Diverse Hardware Support](https://github.com/vllm-project/vllm/issues/9268) and ([vllm-project/vllm#6080](https://github.com/vllm-project/vllm/pull/6080)): This RFC introduces the **platform** submodule, which centralizes hardware-related implementations to reduce conditional logic in the main codebase and lays the foundation for modularization. + +Based on these RFCs, we proposed [[RFC] Hardware Pluggable](https://github.com/vllm-project/vllm/issues/11162), which integrates the `Platform` module into vLLM as a plugin. Additionally, we refactored `Executor`, `Worker`, `ModelRunner`, `AttentionBackend`, and `Communicator` to support hardware plugins more flexibly. + +Currently, vLLM community has successfully implemented the Platform module introduced in the RFC. The functionality is validated through the [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) and [vllm-project/vllm-spyre](https://github.com/vllm-project/vllm-spyre) projects. Using this plugin mechanism, we successfully integrated vLLM with the Ascend NPU and IBM Spyre backends. + +--- + +## How to Integrate a New Backend via vLLM Hardware Plugin Mechanism + +This section will dive into integrating a New Backend via the Hardware Plugin in both developer and user perspective. + +### Developer Perspective + +To integrate a new backend into vLLM using the Hardware Plugin, follow these steps: + +#### Step 1: Create a New Project and Initialize the Platform + +Start by creating a Python project for the new backend and adding a `platform.py` file. Then, import the `Platform` class from `vllm.platforms` and implement the required attributes and methods. + +You can refer to the [`platform.py`](https://github.com/vllm-project/vllm-ascend/blob/72a43a61d8d2193dddbfcc60578fd642008225a5/vllm_ascend/platform.py#L52) in vLLM Ascend project for an example. + +#### Step 2: Implement Custom Worker, Model Runner, Attention Backend, and Communicator Modules + +Depending on the new backend's requirements, implement the following modules: + +```python +from vllm.worker.worker_base import WorkerBase +from vllm.worker.model_runner_base import ModelRunnerBase +from vllm.attention.backends.abstract import AttentionBackend +from vllm.distributed.device_communicators.base_communicator import CommunicatorBase +``` + +Each of these classes has a corresponding base class in vLLM. Again, you can refer to [vLLM Ascend's implementation](https://github.com/vllm-project/vllm-ascend/tree/main/vllm_ascend) for an example. + +#### Step 3: Register the Plugin + +Register the plugin in `setup.py` using entrypoint mechanism of python: + +```python +setup( + entry_points={'vllm.platform_plugins': ["{your_platform_name} = {code_path}:{register_function}"]} +) +``` + +- `{your_platform_name}`: The name of the new backend (can be arbitrary). +- `{code_path}`: The path to the main Python module. +- `{register_function}`: The register function, which returns the path of `Platform` class defined in step 1. + +Refer to [`setup.py`](https://github.com/vllm-project/vllm-ascend/blob/72a43a61d8d2193dddbfcc60578fd642008225a5/setup.py#L102) in vLLM Ascend for a practical example. + +--- + +### User Perspective + +Only need to install vllm and your plugin before running, taking [vllm-ascend](https://github.com/vllm-project/vllm-ascend) as an example: + +```bash +pip install vllm vllm-ascend +``` + +On startup, you will observe the following logs, which means the backend plugin is working properly: + +```bash +INFO 02-06 15:49:01 __init__.py:30] Available plugins for group vllm.platform_plugins: +INFO 02-06 15:49:01 __init__.py:32] name=ascend, value=vllm_ascend:register +… … +INFO 02-06 15:49:01 __init__.py:44] plugin ascend loaded. +INFO 02-06 15:49:01 __init__.py:181] Platform plugin ascend is activated +``` + +--- + +## What's Next? + +Moving forward, we will continue collaborating with developers in the vLLM community to enhance the following aspects: + +1. Continuous enhancements to the V1 Engine. +2. Expanding plugin support for more modules and features, such as scheduler and custom operators. +3. Better user experience and higher performance. + +We encourage everyone to try out this new feature! If you have any questions, join the [vLLM Slack](https://inviter.co/vllm-slack) and participate in the **#sig-extensible-hardware** channel for discussions. 🚀 From 7d707946739a22a8003ab284ae31be9a20a424d8 Mon Sep 17 00:00:00 2001 From: youkaichao Date: Mon, 12 May 2025 16:12:50 +0800 Subject: [PATCH 2/7] rename to updated date Signed-off-by: youkaichao --- ...025-03-12-hardware-plugin.md => 2025-05-12-hardware-plugin.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename _posts/{2025-03-12-hardware-plugin.md => 2025-05-12-hardware-plugin.md} (100%) diff --git a/_posts/2025-03-12-hardware-plugin.md b/_posts/2025-05-12-hardware-plugin.md similarity index 100% rename from _posts/2025-03-12-hardware-plugin.md rename to _posts/2025-05-12-hardware-plugin.md From 39caae1b1486cbcf6331c2e7cf2caa7decf8d18d Mon Sep 17 00:00:00 2001 From: youkaichao Date: Mon, 12 May 2025 16:32:33 +0800 Subject: [PATCH 3/7] minor update Signed-off-by: youkaichao --- _posts/2025-05-12-hardware-plugin.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/_posts/2025-05-12-hardware-plugin.md b/_posts/2025-05-12-hardware-plugin.md index c0f8af3..90e213f 100644 --- a/_posts/2025-05-12-hardware-plugin.md +++ b/_posts/2025-05-12-hardware-plugin.md @@ -1,12 +1,11 @@ --- layout: post -title: "Introducing vLLM Hardware Plugin and Best Practice with Ascend NPU" -author: "vLLM Ascend Team" +title: "Introducing vLLM Hardware Plugin, Best Practice from Ascend NPU" +author: "The Ascend Team on vLLM" image: /assets/logos/vllm-logo-only-light.png --- -Since December 2024, through the joint efforts of the vLLM community and the vLLM Ascend team, we have completed the [Hardware Pluggable RFC]((https://github.com/vllm-project/vllm/issues/11162)). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms. The RFC has now taken initial shape. -This proposal enables hardware integration into vLLM in a decoupled way, allowing for quick and modular support of various hardware platforms. +Since December 2024, through the joint efforts of the vLLM community and the vLLM Ascend team, we have completed the [Hardware Pluggable RFC]((https://github.com/vllm-project/vllm/issues/11162)). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms. --- @@ -18,7 +17,7 @@ Currently, vLLM already supports multiple backends. However, as the number of vL - **High Maintenance Costs**: The cost of maintaining backends is high, not only for the backend developers but also for the vLLM community. The scarcity of community contributor resources makes efficiently adding new features difficult when backend maintainers are not present. - **Lack of Extensibility**: While vLLM follows a well-structured layered design by implementing backends through `Executor`, `Worker`, `Runner`, and `Attention`, supporting new hardware often requires invasive modifications or patching rather than dynamic registration. This makes adding new backends cumbersome. -Recognizing the need for a flexible and modular approach to integrating hardware backends, we identified hardware pluginization as a feasible solution: +Recognizing the need for a flexible and modular approach to integrating hardware backends, we proposed hardware plugins as a feasible solution: - **Decoupled Codebase**: The hardware backend plugin code remains independent, making the vLLM core code cleaner. - **Reduced Maintenance Burden**: vLLM developers can focus on generic features without being overwhelmed by the differences caused by backend-specific implementations. @@ -112,4 +111,4 @@ Moving forward, we will continue collaborating with developers in the vLLM commu 2. Expanding plugin support for more modules and features, such as scheduler and custom operators. 3. Better user experience and higher performance. -We encourage everyone to try out this new feature! If you have any questions, join the [vLLM Slack](https://inviter.co/vllm-slack) and participate in the **#sig-extensible-hardware** channel for discussions. 🚀 +We encourage everyone to try out this new feature! If you have any questions, join the [vLLM Slack](https://slack.vllm.ai) and participate in the **#sig-extensible-hardware** channel for discussions. 🚀 From 20922255fd798330b647ff1dc4a95e2cca9fbfd6 Mon Sep 17 00:00:00 2001 From: Mengqing Cao Date: Wed, 14 May 2025 00:03:20 +0800 Subject: [PATCH 4/7] update next step and acknowledgement Signed-off-by: Mengqing Cao --- _posts/2025-05-12-hardware-plugin.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/_posts/2025-05-12-hardware-plugin.md b/_posts/2025-05-12-hardware-plugin.md index 90e213f..5ee0d8a 100644 --- a/_posts/2025-05-12-hardware-plugin.md +++ b/_posts/2025-05-12-hardware-plugin.md @@ -1,7 +1,7 @@ --- layout: post title: "Introducing vLLM Hardware Plugin, Best Practice from Ascend NPU" -author: "The Ascend Team on vLLM" +author: "The vLLM Ascend team" image: /assets/logos/vllm-logo-only-light.png --- @@ -79,7 +79,7 @@ setup( - `{code_path}`: The path to the main Python module. - `{register_function}`: The register function, which returns the path of `Platform` class defined in step 1. -Refer to [`setup.py`](https://github.com/vllm-project/vllm-ascend/blob/72a43a61d8d2193dddbfcc60578fd642008225a5/setup.py#L102) in vLLM Ascend for a practical example. +Refer to [`setup.py`](https://github.com/vllm-project/vllm-ascend/blob/72a43a61d8d2193dddbfcc60578fd642008225a5/setup.py#L102) in vLLM Ascend for a practical example. --- @@ -107,8 +107,14 @@ INFO 02-06 15:49:01 __init__.py:181] Platform plugin ascend is activated Moving forward, we will continue collaborating with developers in the vLLM community to enhance the following aspects: -1. Continuous enhancements to the V1 Engine. -2. Expanding plugin support for more modules and features, such as scheduler and custom operators. +1. Continuous enhancements to the V1 Engine and VLMs. +2. Expanding plugin support for more modules and features, such as scheduler, graph mode and custom operators. 3. Better user experience and higher performance. +4. Maintenance and enhancement of a stable plugin architecture for appropriate hardware platforms We encourage everyone to try out this new feature! If you have any questions, join the [vLLM Slack](https://slack.vllm.ai) and participate in the **#sig-extensible-hardware** channel for discussions. 🚀 + + +## Acknowledgements + +This flexible hardware backend plugin mechanism would not have been possible without the efforts contributed by a lot of vLLM contributors. Thus we are deeply grateful to the vLLM maintainers, including [Kaichao You](https://github.com/youkaichao), [Simon Mo](https://github.com/simon-mo), [Cyrus Leung](https://github.com/DarkLight1337), [Robert Shaw](https://github.com/robertgshaw2-redhat), [Michael Goin](https://github.com/mgoin) and [Jee Jee Li](https://github.com/jeejeelee) for related refactor, deeply discuss and quickly review, [Xiyuan Wang](https://github.com/wangxiyuan), [Shanshan Shen](https://github.com/shen-shanshan), [Chenguang Li](https://github.com/noemotiovon) and [Mengqing Cao](https://github.com/MengqingCao) from vLLM Ascend team for mechanism design and implentment, [Joe Runde](https://github.com/joerunde) and [Yannick Schnider](https://github.com/yannicks1) from the vLLM Spyre team for pluggable scheduler design and implentment, and other contributors, including [yancong](https://github.com/ice-tong) for extendable quantization method design and implentment, [Aviv Keshet](https://github.com/akeshet) for extendable `SamplingParams`. From bea7a2f9e36f60712708bff15e09b379569cc6d3 Mon Sep 17 00:00:00 2001 From: youkaichao Date: Wed, 14 May 2025 13:25:47 +0800 Subject: [PATCH 5/7] rename the team Signed-off-by: youkaichao --- _posts/2025-05-12-hardware-plugin.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2025-05-12-hardware-plugin.md b/_posts/2025-05-12-hardware-plugin.md index 5ee0d8a..c71055c 100644 --- a/_posts/2025-05-12-hardware-plugin.md +++ b/_posts/2025-05-12-hardware-plugin.md @@ -1,11 +1,11 @@ --- layout: post title: "Introducing vLLM Hardware Plugin, Best Practice from Ascend NPU" -author: "The vLLM Ascend team" +author: "The Ascend Team on vLLM" image: /assets/logos/vllm-logo-only-light.png --- -Since December 2024, through the joint efforts of the vLLM community and the vLLM Ascend team, we have completed the [Hardware Pluggable RFC]((https://github.com/vllm-project/vllm/issues/11162)). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms. +Since December 2024, through the joint efforts of the vLLM community and the Ascend team on vLLM, we have completed the [Hardware Pluggable RFC]((https://github.com/vllm-project/vllm/issues/11162)). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms. --- From aec0a25c0220695aaba7d2f0fbf07746b9d4f18b Mon Sep 17 00:00:00 2001 From: youkaichao Date: Wed, 14 May 2025 13:28:03 +0800 Subject: [PATCH 6/7] fix typo Signed-off-by: youkaichao --- _posts/2025-05-12-hardware-plugin.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-05-12-hardware-plugin.md b/_posts/2025-05-12-hardware-plugin.md index c71055c..0b64340 100644 --- a/_posts/2025-05-12-hardware-plugin.md +++ b/_posts/2025-05-12-hardware-plugin.md @@ -117,4 +117,4 @@ We encourage everyone to try out this new feature! If you have any questions, jo ## Acknowledgements -This flexible hardware backend plugin mechanism would not have been possible without the efforts contributed by a lot of vLLM contributors. Thus we are deeply grateful to the vLLM maintainers, including [Kaichao You](https://github.com/youkaichao), [Simon Mo](https://github.com/simon-mo), [Cyrus Leung](https://github.com/DarkLight1337), [Robert Shaw](https://github.com/robertgshaw2-redhat), [Michael Goin](https://github.com/mgoin) and [Jee Jee Li](https://github.com/jeejeelee) for related refactor, deeply discuss and quickly review, [Xiyuan Wang](https://github.com/wangxiyuan), [Shanshan Shen](https://github.com/shen-shanshan), [Chenguang Li](https://github.com/noemotiovon) and [Mengqing Cao](https://github.com/MengqingCao) from vLLM Ascend team for mechanism design and implentment, [Joe Runde](https://github.com/joerunde) and [Yannick Schnider](https://github.com/yannicks1) from the vLLM Spyre team for pluggable scheduler design and implentment, and other contributors, including [yancong](https://github.com/ice-tong) for extendable quantization method design and implentment, [Aviv Keshet](https://github.com/akeshet) for extendable `SamplingParams`. +This flexible hardware backend plugin mechanism would not have been possible without the efforts contributed by a lot of vLLM contributors. Thus we are deeply grateful to the vLLM maintainers, including [Kaichao You](https://github.com/youkaichao), [Simon Mo](https://github.com/simon-mo), [Cyrus Leung](https://github.com/DarkLight1337), [Robert Shaw](https://github.com/robertgshaw2-redhat), [Michael Goin](https://github.com/mgoin) and [Jie Li](https://github.com/jeejeelee) for related refactor, deep discussion and quick review, [Xiyuan Wang](https://github.com/wangxiyuan), [Shanshan Shen](https://github.com/shen-shanshan), [Chenguang Li](https://github.com/noemotiovon) and [Mengqing Cao](https://github.com/MengqingCao) from the Ascend team on vLLM for mechanism design and implementation, [Joe Runde](https://github.com/joerunde) and [Yannick Schnider](https://github.com/yannicks1) from the Spyre team on vLLM for pluggable scheduler design and implementation, and other contributors, including [yancong](https://github.com/ice-tong) for extendable quantization method design and implementation, [Aviv Keshet](https://github.com/akeshet) for extendable `SamplingParams`. From 6d39fd4090a44c10e58dc72676be0e59ff871a09 Mon Sep 17 00:00:00 2001 From: youkaichao Date: Wed, 14 May 2025 13:37:42 +0800 Subject: [PATCH 7/7] fix typo Signed-off-by: youkaichao --- _posts/2025-05-12-hardware-plugin.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/_posts/2025-05-12-hardware-plugin.md b/_posts/2025-05-12-hardware-plugin.md index 0b64340..283a50c 100644 --- a/_posts/2025-05-12-hardware-plugin.md +++ b/_posts/2025-05-12-hardware-plugin.md @@ -5,7 +5,7 @@ author: "The Ascend Team on vLLM" image: /assets/logos/vllm-logo-only-light.png --- -Since December 2024, through the joint efforts of the vLLM community and the Ascend team on vLLM, we have completed the [Hardware Pluggable RFC]((https://github.com/vllm-project/vllm/issues/11162)). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms. +Since December 2024, through the joint efforts of the vLLM community and the Ascend team on vLLM, we have completed the [Hardware Pluggable RFC](https://github.com/vllm-project/vllm/issues/11162). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms. --- @@ -34,17 +34,17 @@ Before introducing the vLLM Hardware Plugin, let's first look at two prerequisit Based on these RFCs, we proposed [[RFC] Hardware Pluggable](https://github.com/vllm-project/vllm/issues/11162), which integrates the `Platform` module into vLLM as a plugin. Additionally, we refactored `Executor`, `Worker`, `ModelRunner`, `AttentionBackend`, and `Communicator` to support hardware plugins more flexibly. -Currently, vLLM community has successfully implemented the Platform module introduced in the RFC. The functionality is validated through the [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) and [vllm-project/vllm-spyre](https://github.com/vllm-project/vllm-spyre) projects. Using this plugin mechanism, we successfully integrated vLLM with the Ascend NPU and IBM Spyre backends. +Currently, the vLLM community has successfully implemented the Platform module introduced in the RFC. The functionality is validated through the [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) and [vllm-project/vllm-spyre](https://github.com/vllm-project/vllm-spyre) projects. Using this plugin mechanism, we successfully integrated vLLM with the Ascend NPU and IBM Spyre backends. --- ## How to Integrate a New Backend via vLLM Hardware Plugin Mechanism -This section will dive into integrating a New Backend via the Hardware Plugin in both developer and user perspective. +This section will dive into integrating a new backend via the hardware plugin in both developer and user perspective. ### Developer Perspective -To integrate a new backend into vLLM using the Hardware Plugin, follow these steps: +To integrate a new backend into vLLM using the hardware plugin, follow these steps: #### Step 1: Create a New Project and Initialize the Platform @@ -67,7 +67,7 @@ Each of these classes has a corresponding base class in vLLM. Again, you can ref #### Step 3: Register the Plugin -Register the plugin in `setup.py` using entrypoint mechanism of python: +Register the plugin in `setup.py` using the entrypoint mechanism of python: ```python setup( @@ -85,7 +85,7 @@ Refer to [`setup.py`](https://github.com/vllm-project/vllm-ascend/blob/72a43a61d ### User Perspective -Only need to install vllm and your plugin before running, taking [vllm-ascend](https://github.com/vllm-project/vllm-ascend) as an example: +Users only need to install vllm and your plugin before running, taking [vllm-ascend](https://github.com/vllm-project/vllm-ascend) as an example: ```bash pip install vllm vllm-ascend @@ -117,4 +117,4 @@ We encourage everyone to try out this new feature! If you have any questions, jo ## Acknowledgements -This flexible hardware backend plugin mechanism would not have been possible without the efforts contributed by a lot of vLLM contributors. Thus we are deeply grateful to the vLLM maintainers, including [Kaichao You](https://github.com/youkaichao), [Simon Mo](https://github.com/simon-mo), [Cyrus Leung](https://github.com/DarkLight1337), [Robert Shaw](https://github.com/robertgshaw2-redhat), [Michael Goin](https://github.com/mgoin) and [Jie Li](https://github.com/jeejeelee) for related refactor, deep discussion and quick review, [Xiyuan Wang](https://github.com/wangxiyuan), [Shanshan Shen](https://github.com/shen-shanshan), [Chenguang Li](https://github.com/noemotiovon) and [Mengqing Cao](https://github.com/MengqingCao) from the Ascend team on vLLM for mechanism design and implementation, [Joe Runde](https://github.com/joerunde) and [Yannick Schnider](https://github.com/yannicks1) from the Spyre team on vLLM for pluggable scheduler design and implementation, and other contributors, including [yancong](https://github.com/ice-tong) for extendable quantization method design and implementation, [Aviv Keshet](https://github.com/akeshet) for extendable `SamplingParams`. +This flexible hardware backend plugin mechanism would not have been possible without the efforts of many vLLM contributors. Thus we are deeply grateful to the vLLM maintainers, including [Kaichao You](https://github.com/youkaichao), [Simon Mo](https://github.com/simon-mo), [Cyrus Leung](https://github.com/DarkLight1337), [Robert Shaw](https://github.com/robertgshaw2-redhat), [Michael Goin](https://github.com/mgoin) and [Jie Li](https://github.com/jeejeelee) for related refactor, deep discussion and quick review, [Xiyuan Wang](https://github.com/wangxiyuan), [Shanshan Shen](https://github.com/shen-shanshan), [Chenguang Li](https://github.com/noemotiovon) and [Mengqing Cao](https://github.com/MengqingCao) from the Ascend team on vLLM for mechanism design and implementation, [Joe Runde](https://github.com/joerunde) and [Yannick Schnider](https://github.com/yannicks1) from the Spyre team on vLLM for pluggable scheduler design and implementation, and other contributors, including [yancong](https://github.com/ice-tong) for extendable quantization method design and implementation, [Aviv Keshet](https://github.com/akeshet) for extendable `SamplingParams`.