Skip to content

# Explore options for lowering of vector.contract to FEAT_I8MM, unified between Neon and SVE #145559

Open
@momchil-velikov

Description

@momchil-velikov

MLIR contains two patterns for lowering of vector.contract to FEAT_I8MM "instructions":

  • LowerContractionToNeonI8MMPattern
  • LowerContractionToSVEI8MMPattern

It may be possible and beneficial to develop a unified pattern, able to generate code for either Neon or SVE.

There are some differences in the functionality between the patterns:

  • Neon pattern handles "arbitrary"[1] indexing maps, the SVE pattern only the usual "identities + transposed RHS" one.
  • The Neon can handle input operands of types iN, N <= 8, SVE only handles i8.
  • In the Neon pattern the constraints on left-hand side, right-hand side, and accumulator/output tiles
    are to be evenly breakable into 2x8, 8x2, and 2x2 tiles, respectively, plus the support for left-hand side being one-dimensional.
    In the SVE pattern the constraints for left-hand side, right-hand side, and accumulator/output are to have shapes <Mx8>+, <8x[N]>, and
    <Mx[N]>, respectively, with M and N even. Notably the K dimension is fixed to 8 and only the N dimension is allowed to be scalable.

[1] "arbitrary" in the sense it does not impose explicit requirements on the maps and
handles them in a generic manner; however the pattern does not trigger for <4x8> * <8x4> with
canonical/textbook matrix multiplication maps whereas it does trigger for <4x8> * <4x8> with maps for a
transposed right-hand side. Unclear bug or by design.
(Also indexing maps are not entirely "arbitrary", they need to make sense in the context of vector.contract).

Before any unification, it would be nice if the functionality of both patterns converged to a common point.

A. Indexing maps

  • Restrict the Neon indexing maps
    This is straightforward.
  • Support "arbitrary" indexing maps with SVE (are there any variants other than straight and transposed?)
    This is bit more involved, but still doable, under the assumption one would need at most one extra transpose op
    to accommodate for the data layout expected by the FEAT_I8MM instructions.

B. "Small" integer types (i4, i6, etc)

  • It does not seem reasonable to remove this from Neon.
  • Should not be a problem to adds to SVE. May or may not expose the need to add codegen elsewhere (i.e. sign-/zero- extend with scalable vector types)

C. Input/output shapes
This need a lot of thought. The restriction K == 8 is fairly fundamental to the SVE pattern and provides a number of adjacency guarantees (in the context FEAT_I8MM). It won't be easy to lift that restriction.
In the context of tiled matrix-multiplication (where the operands to the vector.contract do not represent the whole matrix, but just tiles of a bigger one) the ability to have tile dimensions many multiples of 8 is unlikely to be very valuable - even a 8x8 output tile would require 16 SIMD registers - bigger tiles may exceed the number of available registers and introduce spills in something that is likely to be an inner loop.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions