Commit e30d84c

and

committed

[Refactor] Consolidate MoE quantization parameters into FusedMoeQuantConfig

Consolidates multiple boolean quantization parameters (use_fp8_w8a8, use_int8_w8a8, use_int8_w8a16, use_int4_w4a16, per_channel_quant, block_shape) into a single type-safe FusedMoeQuantConfig object across fused_experts, invoke_fused_moe_kernel, and fused_moe functions. Key improvements: - Type-safe configuration with QuantizationType enum - Factory methods for common quantization patterns - Built-in validation preventing conflicting configurations - Seamless backward compatibility with deprecation warnings - Performance optimizations with cached properties - Cleaner, more maintainable API for future extensions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

1 parent 813e0b8 commit e30d84cCopy full SHA for e30d84c

1 file changed

+483

-183

lines changed

vllm/model_executor/layers/fused_moe
- fused_moe.py

1 file changed

+483

-183

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit e30d84c

1 file changed

1 file changed

Uh oh!

File tree

1 file changed

1 file changed

0 commit comments