Skip to content

Conversation

@ethansaurusrex
Copy link
Contributor

@ethansaurusrex ethansaurusrex commented Aug 6, 2025

Prototype for portable kernels.

MLIR-bytecode-based portable kernels

This prototype uses the fact that MLIR can be serialized at any point in the compilation pipeline in order to create a "binary" that can placed in the code_object_op, this MLIRBC could be reloaded into an MLIR module and the rest of the compilation passes could be ran.

In order to make the MLIR kernels portable, the MLIR generated by MIGraphX utilized the gfxMIGX arch, a dummy arch with a dummy set of arch features. As the MLIR pass, TosaToRock was ran it would read this dummy arch and features and propagate them to the rock ops. See: ROCm/rocMLIR#1937

MIGraphX Changes:

Compilation

On the frontend a --portable flag was added for the compile option in the driver. This was used by the compile_ops pass to instead follow an alternate path that compiled the code to bytecode as described above. The bytecode was saved in the binary section of the code_object_op and to mark it, an enum type format was added. The rest of the compilation passes could then continue as before. This allows us to run all MIGraphX-optimizations on the input graph, leaving just the tuning and finalization for the MLIR kernels.

Running

When running driver run or driver perf the driver will first check if the input file is a compiled graph, if so it would normally finish the compilation step and go to running the code. This was modified to add a check for the mlir_bytecode format. If present, a new pass compile_bytecode was ran.

compile_bytecode

This pass was a new pass that started as a copy of compile_ops. The changes required were using modified update_configs and compile functions, which called compilation methods purpose built for compiling and running MLIR bytecode that were added to src/targets/gpu/mlir.cpp.

SPIRV

In addition to MLIR code being compiled to bytecode, the non-MLIR kernels were compiled using the amdgcnspirv target arch for amdclang++, which allowed those kernels to also be portable, with some constraints. Namely, the global and local sizes need to be standardized. The wavegroup size must also be updated/multiple code objects must be created to support the current range of wavegroups (as of Aug 2025: 32 or 64).

Improvements to be made

As a prototype, it shows that either MLIR, or some other form of IR, could be used to create a partially compiled model that could be compiled on one machine, then quickly tuned and finalized on another, without having to redo expensive MIGraphX compilation passes and non-MLIR compilations. This is not without it's restrictions. The following are some improvements that could be implemented to improve the performance of the current (hacky) implementation.

  • rocMLIR: MLIR is not backwards compatible: one of the main appeals of pre-compiled graphs is the cross platform, and hopefully, backwards compatible nature. This would require a more stable way of generating this portable form, hopefully a SPIR-V lowering that could be raised back to MLIR and finalized.
  • MIGraphX: A dedicated portable_code_object operation and compilation passes/modified compile_ops. This would allow us
  • MIGraphX: To facilitate the above dedicated operation and pass,
  • MIGraphX/rocMLIR: The current implementation uses a rocMLIR C-API that reads the bytecode and converts it to an MLIR module. This module is then returned and used for a single compilation pass. This is highly wasteful when we get to tuning, since we might do anywhere from 10-1000 compilations of a single kernel with different tuning parameters. The improved version of this, that can also be applied to the rocMLIR pipeline is to do the partial compilation, then copy the module using my_mlir_module.clone(). This would reduce the redundant work done during compilation.
  • rocMLIR: kernel caching, since this is a JIT compilation path it might be beneficial to cache the kernel at "runtime" so that subsequent compilations of the same kernel on the same device can just use an already compiled kernel instead of tuning again.

There are many other improvements that can be made to improve the performance and reliability of this prototype, but it still serves as a working possibility for portable models in MIGraphX.

@codecov
Copy link

codecov bot commented Aug 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #4205   +/-   ##
==========================================
  Coverage           ?   92.23%           
==========================================
  Files              ?      553           
  Lines              ?    25512           
  Branches           ?        0           
==========================================
  Hits               ?    23529           
  Misses             ?     1983           
  Partials           ?        0           
Files with missing lines Coverage Δ
src/include/migraphx/compile_options.hpp 100.00% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants