Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Prototype for portable kernels.
MLIR-bytecode-based portable kernels
This prototype uses the fact that MLIR can be serialized at any point in the compilation pipeline in order to create a "binary" that can placed in the
code_object_op, this MLIRBC could be reloaded into an MLIR module and the rest of the compilation passes could be ran.In order to make the MLIR kernels portable, the MLIR generated by MIGraphX utilized the
gfxMIGXarch, a dummy arch with a dummy set of arch features. As the MLIR pass,TosaToRockwas ran it would read this dummy arch and features and propagate them to the rock ops. See: ROCm/rocMLIR#1937MIGraphX Changes:
Compilation
On the frontend a
--portableflag was added for thecompileoption in the driver. This was used by thecompile_opspass to instead follow an alternate path that compiled the code to bytecode as described above. The bytecode was saved in the binary section of thecode_object_opand to mark it, an enum typeformatwas added. The rest of the compilation passes could then continue as before. This allows us to run all MIGraphX-optimizations on the input graph, leaving just the tuning and finalization for the MLIR kernels.Running
When running
driver runordriver perfthe driver will first check if the input file is a compiled graph, if so it would normally finish the compilation step and go to running the code. This was modified to add a check for themlir_bytecodeformat. If present, a new passcompile_bytecodewas ran.compile_bytecode
This pass was a new pass that started as a copy of
compile_ops. The changes required were using modifiedupdate_configsandcompilefunctions, which called compilation methods purpose built for compiling and running MLIR bytecode that were added tosrc/targets/gpu/mlir.cpp.SPIRV
In addition to MLIR code being compiled to bytecode, the non-MLIR kernels were compiled using the
amdgcnspirvtarget arch foramdclang++, which allowed those kernels to also be portable, with some constraints. Namely, the global and local sizes need to be standardized. The wavegroup size must also be updated/multiple code objects must be created to support the current range of wavegroups (as of Aug 2025: 32 or 64).Improvements to be made
As a prototype, it shows that either MLIR, or some other form of IR, could be used to create a partially compiled model that could be compiled on one machine, then quickly tuned and finalized on another, without having to redo expensive MIGraphX compilation passes and non-MLIR compilations. This is not without it's restrictions. The following are some improvements that could be implemented to improve the performance of the current (hacky) implementation.
compile_ops. This would allow usmy_mlir_module.clone(). This would reduce the redundant work done during compilation.There are many other improvements that can be made to improve the performance and reliability of this prototype, but it still serves as a working possibility for portable models in MIGraphX.