Skip to content

[mlir][Aarch64] Improve i8mm instruction sequence for vector.contract #90416

Open
@dcaballe

Description

@dcaballe

The i8mm lowering for some vector.contract ops is currently functionally correct. However, performance wise there is some room for improvement. Looking at the generated asm for an mmt4d with 2x2x8 innermost tile sizes, we get:

    1470: 6e180483      mov     v3.d[1], v4.d[0]                                                                                                                                                                           
    1474: 4e006204      tbl     v4.16b, { v16.16b, v17.16b, v18.16b, v19.16b }, v0.16b                                                                                                                                     
    1478: 4e84a462      smmla   v2.4s, v3.16b, v4.16b                                                                                                                                                                      
    147c: 6e024041      ext     v1.16b, v2.16b, v2.16b, #0x8 

It calls my attention the mov instruction, esp. the indexing from 1 to 0, the tbl and the ext instructions. This may not seem a big deal but the problem is really exacerbated when using larger tile sizes. We observed large sequences of mov and ext instructions all over the place.

We should investigate what is going on and try to fix the problem. My suspicion is that this zero initialization and insertion for vecmat cases might be behind some of these instructions. We should try if using llvm.undef fixes part of the problem.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions