Skip to content

Conversation

@maxnick
Copy link
Contributor

@maxnick maxnick commented Oct 16, 2025

Details:

In this PR we introduce yet another operation "GatherMatmu", which essentially does gemv operations over the current tokens and the active experts.
As the first step, we perform gemv operation using the dnnl::inner_product. But obviously this solution is suboptimal, as it doesn't give a fine grain control over parallelization, and in the case of many tokens being processed by a specific expert (prefill), having gemm operation may be more optimal as the tokens may be batched and we can do SIMD level parallelization by tokens as well.
Also this PR contains all the essential transformations that allow to enable a few common MoE patterns.

MoE pattern matcher is based on #32183

Related oneDNN fork PR: openvinotoolkit/oneDNN#292

Tickets:

@maxnick maxnick assigned maxnick and v-Golubev and unassigned maxnick Oct 16, 2025
@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Oct 16, 2025
@maxnick maxnick requested a review from Copilot October 28, 2025 16:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the GatherMatmul operation to optimize Mixture of Experts (MoE) patterns in the CPU plugin. The implementation performs GEMV operations over active experts using oneDNN's inner_product primitive, with support for both standard and compressed weights configurations.

Key changes:

  • Adds GatherMatmul node implementation with oneDNN-based execution for both GEMV and GEMM modes
  • Implements pattern matchers (MoE2GeMM and MoE3GeMM) to detect and transform MoE subgraphs
  • Extends weight decompression infrastructure to support batched (3D) weight tensors
  • Introduces CompressedWeightsBlock pattern block to share weight decompression logic across operations

Reviewed Changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/tests/functional/plugin/shared/src/subgraph/weights_decompression_builders.cpp Updated to support batched weight decompression with seed parameter and optional transpose control
src/tests/functional/plugin/shared/src/subgraph/moe_builders.cpp New MoE test graph builders for 2GEMM and 3GEMM patterns with weight decompression support
src/plugins/intel_cpu/src/nodes/gathermatmul.cpp Core implementation of GatherMatmul node with oneDNN inner_product backend
src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/convert_moe_matmuls.cpp Pattern matchers to detect and replace MoE patterns with BatchGatherMatmul operations
src/plugins/intel_cpu/src/transformations/cpu_opset/common/op/batch_gather_matmul*.cpp New internal operations for batch gather matmul with and without compression
src/common/transformations/src/transformations/op_conversions/convert_fc_to_compressed.cpp Refactored to extract reusable weight processing logic into static method
src/core/include/openvino/pass/pattern/op/block_util.hpp Updated FOR_EACH macros to support passing block pointer as parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@maxnick maxnick marked this pull request as ready for review October 28, 2025 16:57
@maxnick maxnick requested review from a team as code owners October 28, 2025 16:57
@maxnick maxnick requested review from mryzhov and removed request for a team October 28, 2025 16:57
@maxnick maxnick force-pushed the cpu_moe_op_support branch from d7f9425 to 91001a5 Compare October 29, 2025 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: transformations OpenVINO Runtime library - Transformations Code Freeze

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants