@@ -83,7 +83,8 @@ based on assigned priority, with FCFS as a tie-breaker), configurable via the
83
83
| ** Decoder-only Models** | <nobr >🚀 Optimized</nobr > |
84
84
| ** Encoder-Decoder Models** | <nobr >🟠 Delayed</nobr > |
85
85
| ** Embedding Models** | <nobr >🟢 Functional</nobr > |
86
- | ** Mamba Models** | <nobr >🚧 WIP ([ PR #19327 ] ( https://github.com/vllm-project/vllm/pull/19327 ) )</nobr > |
86
+ | ** Mamba Models** | <nobr >🟢 Functional</nobr > |
87
+ | ** Hybrid Models** | <nobr >🟢 Functional</nobr > |
87
88
| ** Multimodal Models** | <nobr >🟢 Functional</nobr > |
88
89
89
90
vLLM V1 currently excludes model architectures with the ` SupportsV0Only ` protocol.
@@ -104,8 +105,16 @@ to enable simultaneous generation and embedding using the same engine instance i
104
105
105
106
#### Mamba Models
106
107
107
- Models using selective state-space mechanisms instead of standard transformer attention (e.g., ` MambaForCausalLM ` , ` JambaForCausalLM ` )
108
- will be supported via [ PR #19327 ] ( https://github.com/vllm-project/vllm/pull/19327 ) .
108
+ Models using selective state-space mechanisms instead of standard transformer attention are partially supported.
109
+ Models that use Mamba-2 layers (e.g., ` Mamba2ForCausalLM ` ) are supported, but models that use older Mamba-1 layers
110
+ (e.g., ` MambaForCausalLM ` , ` JambaForCausalLM ` ) are not yet suported. Please note that these models currently require
111
+ enforcing eager mode and disabling prefix caching in V1.
112
+
113
+ #### Hybrid Models
114
+
115
+ Models that combined Mamba-2 layers with standard transformer attention layers are supported (e.g., ` BambaForCausalLM ` ,
116
+ ` Zamba2ForCausalLM ` , ` NemotronHForCausalLM ` , ` FalconH1ForCausalLM ` and ` GraniteMoeHybridForCausalLM ` ). Please note that
117
+ these models currently require enforcing eager mode and disabling prefix caching in V1.
109
118
110
119
#### Encoder-Decoder Models
111
120
0 commit comments