ONNX-MLIR v0.5.0.0 is now available with exciting new features. We thank everyone who contributed to this release!
Please visit onnx-mlir to learn more about ONNX-MLIR.

Key Updates

ONNX 1.17.0
PyBind 2.12.0
Benchmark 1.8.4
IBM z17 NNPA Telum II Support Enabled

What's Changed

Add a python script for generating text using huggingface gpt2 by @tungld in #2983
Remove a spike of memory usage in ScrubDisposablePass. by @imaihal in #2978
RunONNXModel.py: Add a --cache-model=path option by @AlexandreEichenberger in #2984
Enable check-onnx-backend-numerical-nnpa on Jenkins s390x by @tungld in #2985
RunONNXModel.py: save compilation info into a file when using --save-model or --cache-model by @tungld in #2994
Fix wrong total number of phases for EmitObj and EmitJNI by @tungld in #2995
run_gpt2_from_huggingface.py: do not download the onnx data file if it exists by @tungld in #2996
Opening binary constants files fix on zOS by @christopherlmunoz in #2991
[NNPA] Memory reduction of stickified constant by stickifying at file writing by @imaihal in #2917
Option to not emit the full MLIR (only emit .tmp file) by @imaihal in #2997
RunONNXModel.py: allow to change the default model name by @tungld in #2999
upgrade to ONNX 1.17.0 (opset 22) by @gongsu832 in #3004
Add decomposition for ONNXSoftmaxCrossEntropyLossOp by @srcarroll in #2968
Delay scrubbing disposable elements attrs as long as possible by @tungld in #3006
Add limitation for BFLOAT supported ops for NNPA by @Sunny-Anand in #3008
Test the return value of omMMapBinaryFile function and terminate the main program elegantly by @tungld in #3002
Fix a wrong function call by @tungld in #3012
Making runtime omunreachable static to support clang compiler by @christopherlmunoz in #3015
Fix security vulenrabilities by @Sunny-Anand in #3019
Do not fuse locations when normalizing constants for Add and Mul by @jorickert in #3016
Handle full reduction over all dimensions by @tungld in #3022
Use DisposableElementsAttr for ZHigh constant propagation by @tungld in #3013
Re-enable diagnostic error/warning printing by @AlexandreEichenberger in #3020
Transform SequenceAt to split for special cases by @chentong319 in #3018
Add tolerance args to CheckONNXModel.py by @AlexandreEichenberger in #3024
Return a failure instead of crashing if shape inference can not be run because of unraked operand types by @jorickert in #3023
upgrade becnhmark by @Sunny-Anand in #3027
Update llvm-project to llvm/llvm-project@01d233ff403823389f848 by @hamptonm1 in #3011
Update llvm-project to llvm/llvm-project@af20aff35ec3 by @hamptonm1 in #3032
Fix biasScaleShape of GroupNormalizationV21 to support ranks > 4 by @jorickert in #3030
Merge from repo by @AlexandreEichenberger in #3033
Update llvm-project to llvm/llvm-project@e86910337f98 by @hamptonm1 in #3037
Best practice by @AlexandreEichenberger in #3039
[NNPA] Fix some bugs for ReduceMin/Max by @tungld in #3038
Skip over uninitialized DenseResourceAttrs in verifiers by @jorickert in #3041
[NNPA] Revise compiler options for quantization by @tungld in #3043
Update the instruction for building multiple accelerators by @tungld in #3046
Add a document for quantization on NNPA by @tungld in #3045
update onnx opset by @Sunny-Anand in #3050
Remove element type restriction in softmax lowering by @srcarroll in #3051
Fix ASAN/UBSAN issues in DimAnalysis by @jorickert in #3052
Build light weight PyRuntime without llvm or onnx-mlir by @chentong319 in #3044
Option to set the number of threads for parallel compilation by @imaihal in #3048
Update onnx requirement to 1.17.0 by @jorickert in #3054
Optimization for Roberta unstick->reshape->transpose->reshape->stick by @AlexandreEichenberger in #3056
Extend GridSample support by @jorickert in #3060
Remove the pattern unstick_4ds_squeeze_stick_3ds by @tungld in #3062
Instrumentation cleanup when operation was removed by @AlexandreEichenberger in #3061
Add support for ONNX.shape with permutation pattern by @AlexandreEichenberger in #3066
Update docker image to point to github registry in devcontainer-example by @jorickert in #3055
Parallelization of ConstProp compilation by @imaihal in #3042
Bump various ops to opset 22, adding bf16 support by @jorickert in #3059
Bump onnx.Cast to opset 21 , adding int/uint4 support by @jorickert in #3057
Add runtime check for Gather Op by @chentong319 in #3069
fix weak hash by @Sunny-Anand in #3070
Remove the compile option -nnpa-clip-to-dlfloat-range by @tungld in #3075
Matmul CPU performance regression by @AlexandreEichenberger in #3072
ZHigh to ONNX optimization is default on. Switch flag from enable to disable by @AlexandreEichenberger in #3074
Since compiler generated stick/unstick is default on, change new option to disable it by @AlexandreEichenberger in #3073
Add lit tests for KrnlMatmulOp lowering (Krnl to affine) by @AlexandreEichenberger in #3076
Upgrading llvm and stablehlo hash by @christopherlmunoz in #3053
Don't try to free static array in mnist example by @Zentrik in #3049
Handle out-of-bound value for Gather alike operation by @chentong319 in #3077
Extend instrumentSignature to print data by @chentong319 in #3078
Modifying RunONNXModel.py to better support external performance profiling tools by @AlexandreEichenberger in #3082
Add support to use either docker or local compiler to compile a model by @chentong319 in #3081
Use docker and podman package in python driver by @chentong319 in #3087
update pybind11 to version 2.12.0 by @chentong319 in #3088
Bump Upsample to Opset 10 and change the opset versioning to allow to skip over opset versions if a newer, backwards compatible one exists. by @jorickert in #3065
Improve scripts by @AlexandreEichenberger in #3089
Add result type inference to RandomNormalLike and fix wrong hardcodings for dtypes by @jorickert in #3091
Bump various ops to opset 21, adding int4/uint4 and 8 bit float support. by @jorickert in #3064
Added minimal support to do some timing of OM Runtime functionality by @AlexandreEichenberger in #3095
Including __errno_location call for MVS by @christopherlmunoz in #3099
Rewriting pattern to remove WhereOp and EqualOp. by @imaihal in #3094
Enable NNPA saturation by default and change the option to --nnpa-disable-saturation by @tungld in #3101
removing weak attribute of errorno by @christopherlmunoz in #3103
Fix the custom build link for docs/Docker.md by @qjivy in #3104
Python driver for torch model by @chentong319 in #3093
Cherry pick updates from main for z17 and fix for ZHighConstantPropagation in QunarizedStick by @Sunny-Anand in #3133
[cherry-pick]fix CVE-2025-32434 (#3135) by @Sunny-Anand in #3...