Skip to content

Commit e30d84c

Browse files
rahul-tuliclaude
andcommitted
[Refactor] Consolidate MoE quantization parameters into FusedMoeQuantConfig
Consolidates multiple boolean quantization parameters (use_fp8_w8a8, use_int8_w8a8, use_int8_w8a16, use_int4_w4a16, per_channel_quant, block_shape) into a single type-safe FusedMoeQuantConfig object across fused_experts, invoke_fused_moe_kernel, and fused_moe functions. Key improvements: - Type-safe configuration with QuantizationType enum - Factory methods for common quantization patterns - Built-in validation preventing conflicting configurations - Seamless backward compatibility with deprecation warnings - Performance optimizations with cached properties - Cleaner, more maintainable API for future extensions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
1 parent 813e0b8 commit e30d84c

File tree

1 file changed

+483
-183
lines changed

1 file changed

+483
-183
lines changed

0 commit comments

Comments
 (0)