[tuner] Reduce maintenance burden and prepare for more codegen pipelines

This is an uber-issue for making the tuner easier to maintain. The current implementation has a few issues that make the tuner library fragile and prone to getting out of sync with the IREE compiler. Specifically, we identified the following issues:

1. There are two ways to (re-)configure executable sources:
  a. By updating the lowering config and translation info in-situ. This is used when producing candidate dispatches using executable benchmarks as the source-of-truth.
  b. By using the transform dialect library script to match root ops and apply compilation info attributes to them. This is used during the model candidate compilation and benchmarking stage.
  
   As a result, we have duplicate logic to apply configurations found by the constraint solver. The fix is to write a pass that strips existing configuration from executable sources, and then use transform dialect to re-configure them. This can be done as a separate invocation of `iree-opt`.
  
2. The MLIR processing is mostly string-based. While this allowed us to quickly prototype, it makes the code prone to getting out of sync with the IREE compiler. The lowering configs and translation info attributes are considered compiler-internals and there's no stability guarantee as to the exact structure and format of these attributes. As a result, every time the format changes, we have to update the parsing and printing logic in the tuner to match the new format in the compiler.

   Here, the proposed solution is to expose these key attributes (translation info, compilation info, and MFMA intrinsic info) through python bindings. We already have it for the GPU pipeline options that can be used as a template for future bindings: https://github.com/iree-org/iree/pull/18840.
   
3. Make it easier to identify 'root ops'. We can make the IREE compiler annotate the root lingalg ops with a new attribute that the tuner can use to recognize them, without having to duplicate the compiler logic.
   
4. The `Configuration` representation is modeled after the requirements of the `LLVMGPUVectorDistribute` pipeline. This made it so that the surrounding code makes implicit assumptions about the problem representation. Instead, we should define an interface that allows us to support multiple compilation pipelines, such that the generated SMT constraints are specific to **both** the pipeline and the dispatch kind. Further, the constraint generation code should be decoupled from the parsing/printing code, such that projects like TKW can use just the constraint generation and benchmarking infra.

5. Move from two stages of compile-and-benchmark to just one. This made sense for SDXL where the best isolated dispatch does not necessarily perform best across the whole model, but it may not be necessary or even sufficiently general for other applications. This is related to the `libtuner.TuningClient` class; clients should be able to define their own tuning stages with libtuner providing the interface to specify the compilation and benchmarking commands.

# Tasks

- [x] Add an iree-opt pass to strip configuration from executable sources (incl. executable benchmarks) @bangtianliu
    * https://github.com/iree-org/iree/pull/19069
- [x] Expose key attributes via python bindings. @kuhar
    * https://github.com/iree-org/iree/pull/18804
    * https://github.com/iree-org/iree/pull/19095
    * https://github.com/iree-org/iree/pull/19096
    * https://github.com/iree-org/iree/pull/19104
    * https://github.com/iree-org/iree/pull/19107
    * https://github.com/iree-org/iree/pull/19108
    * https://github.com/iree-org/iree/pull/19128
    * https://github.com/iree-org/iree/pull/19129
- [x] Add a utility function to query supported MMA intrinsics and expose it to C API and python @bangtianliu
    * https://github.com/iree-org/iree/pull/19124
    * https://github.com/iree-org/iree/pull/19199
    * https://github.com/iree-org/iree/pull/19218
- [x] Use MLIR types for types in the tuner @kuhar
    * https://github.com/nod-ai/shark-ai/pull/554 
- [x] Use IREE attributes for MFMA intrinsics in the tuner @bangtianliu
    * https://github.com/nod-ai/shark-ai/pull/586
    * https://github.com/nod-ai/shark-ai/pull/605
- [x] Use IREE bindings for compilation info (incl., lowering_config and translation_info) @bangtianliu
    * https://github.com/nod-ai/shark-ai/pull/626
    * https://github.com/nod-ai/shark-ai/pull/629
    * https://github.com/iree-org/iree/pull/19376
    * https://github.com/nod-ai/shark-ai/pull/662
    * https://github.com/nod-ai/shark-ai/pull/669
    * https://github.com/nod-ai/shark-ai/pull/678
- [x] Update the tuner to generate candidate dispatches using the new iree-opt pass and transform dialect tunin specs. @Max191
    * https://github.com/nod-ai/shark-ai/pull/606
    * https://github.com/nod-ai/shark-ai/pull/756
- [x] Modify IREE to annotate root ops with a new unit attribute @nithinsubbiah
    * https://github.com/iree-org/iree/pull/19345
- [x] Update the tuner to identify root ups using the new unit attribute produced by IREE @Max191
    * https://github.com/nod-ai/shark-ai/pull/704
- [x] Move constraint generation logic out of the parsing/printing logic in `candidate_gen.py`. @kuhar
    * https://github.com/nod-ai/SHARK-Platform/pull/508
    * https://github.com/nod-ai/SHARK-Platform/pull/526
    * https://github.com/nod-ai/SHARK-Platform/pull/530
    * https://github.com/nod-ai/SHARK-Platform/pull/531
    * https://github.com/nod-ai/SHARK-Platform/pull/539
    * https://github.com/nod-ai/shark-ai/pull/581
- [x] Use only one compile-benchmark stage in `TuningCandidate`. Update the existing example to adapt to this change. @Max191
    * https://github.com/nod-ai/shark-ai/pull/704
- [ ] Fix duplicate builtin attribute registration issues in MLIR/IREE python bindings gen @makslevental
    * https://github.com/nod-ai/shark-ai/pull/670
    * https://github.com/llvm/llvm-project/pull/117918
    * https://github.com/iree-org/iree/pull/19324


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tuner] Reduce maintenance burden and prepare for more codegen pipelines #453

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[tuner] Reduce maintenance burden and prepare for more codegen pipelines #453

Description

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions