- 
                Notifications
    You must be signed in to change notification settings 
- Fork 112
shorten mlir dump filename #4197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
        
          
                src/targets/gpu/mlir.cpp
              
                Outdated
          
        
      | mlir_program::get_symbol_name(m) + "_" + shape::to_sizes_string(to_shapes(sizes)) + ext; | ||
| replace_string_inplace(name, ", ", "_"); | ||
| replace_string_inplace(name, ":", "s"); | ||
| truncate_file_name(name, ext, 255); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document somewhere on why this limit exists.
        
          
                src/targets/gpu/mlir.cpp
              
                Outdated
          
        
      | abbreviate_symbol_names(fname); | ||
|  | ||
| // if file name is still too long after shortening op names, truncate | ||
| if(fname.length() > max_len) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truncating would mean same filename for different kernels in some cases. We may not see MLIR dump in that case. it's a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you suggest we do in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what would be best approach.
we could do a few things. e.g.  maintain a counter and if names are same then suffix them with _0, _1 etc.
or competely redesign kernel name e.g.
for the attention only keep dot_softmax_dot in the same followed by hash value computed from MLIR kernel so that each unique MLIR kernel will get different name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to address some of this. There is definitely room for improvement with the added counter, currently it only exists to make sure file names dont conflict and cause overwrites but the number is meaningless. If this is reasonable for now, lets move forward and leave more improvements for a future update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally, the need for an added counter is debatable here: because the shortened (new) names don't overload any existing name, so a unique file name will stay unique with your new scheme.
        
          
                src/targets/gpu/mlir.cpp
              
                Outdated
          
        
      | mlirRockTuningParamDestroy); | ||
|  | ||
| // NOLINTNEXTLINE | ||
| static std::atomic<int> dump_counter = 0; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If you can use a dump_counter, then what prevents from using just that (counter) as a file name?
- Of course, we shouldn't dump duplicates (which perhaps get overwritten right now) with increasing counter value -- a possible problem in this scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- nothing stops you, but its not desirable. A half truncated name with a counter will give you more info about what to expect in the file than just a counter
- Not sure what you mean. In general we dont dump duplicates (each kernel name will be unique) but if you have 2 really large names and only the ends of the names are different, truncating will cause the names to match. Thats the issue this is solving
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- nothing stops you, but its not desirable. A half truncated name with a counter will give you more info about what to expect in the file than just a counter
not desirable is very subjective. Algorithmic solution is needed here. One could find a scheme to drastically reduce the filename (length) or just pick the rolling number that you have coded-in, and solve this issue for good! :-)
Unfortunately, the idea of calling transpose as tpose or something like it is not the answer here: because the file names are still very large. Large dimension strings are appended, and this trend to fuse (and thus longer names) isn't going away! We would be forced to visit the file name issue again.  Since the name has been munged already, so might have a simple name was my thought.
- Not sure what you mean. In general we dont dump duplicates (each kernel name will be unique) but if you have 2 really large names and only the ends of the names are different, truncating will cause the names to match. Thats the issue this is solving
If the names were duplicates, (in the old or your new renaming scheme) we currently overwrite that file. But to append a rolling number to it, the duplicated file name now becomes unique -- so will be dumped more than once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this needs to be over-engineered, this is a developer-only debug tool and doesnt affect users at all. Main consumers of this tool are our MLIR friends, so I will let @umangyadav , @dhernandez0 and team decide if they like the current scheme or if a completely uninformative name is their preference.
If the full names were duplicates, its the same module and we would not request mlir to compile it more than once. We dont currently overwrite the file because that scenario is not possible as far as I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my side, I generally debug one mlir kernel at a time. Because it's generally what we get from you guys (a failing or slow single kernel). So, I don't have a preference about what the filename should look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses filename length issues when using the MIGRAPHX_MLIR_DUMP flag by implementing a filename shortening mechanism. The solution prevents compilation errors that occur when generated MLIR dump filenames exceed filesystem limits (typically 255 bytes).
- Implements operation name abbreviation to reduce filename length
- Adds filename truncation with counter suffix as fallback when abbreviation isn't sufficient
- Introduces atomic counter to ensure unique filenames when truncation is required
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
        
          
                src/targets/gpu/mlir.cpp
              
                Outdated
          
        
      | auto sym_names = mlir_program::get_symbol_name(m); | ||
| abbreviate_symbol_names(sym_names); | ||
|  | ||
| static int max_file_length = 255; | 
    
      
    
      Copilot
AI
    
    
    
      Aug 13, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 255 should be defined as a named constant at the top of the file or in a header, making it easier to maintain and understand its purpose across the codebase.
| static int max_file_length = 255; | |
| constexpr int max_file_length = kMaxFileNameLength; | 
        
          
                src/targets/gpu/mlir.cpp
              
                Outdated
          
        
      | if(sym_names.length() + shape_str.length() + ext.length() > max_file_length) | ||
| { | ||
| auto cnt = "_" + std::to_string(dump_counter++); | ||
| auto cutoff = max_file_length - shape_str.length() - ext.length() - cnt.length(); | 
    
      
    
      Copilot
AI
    
    
    
      Aug 13, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cutoff calculation could result in a negative value if the combined length of shape_str, ext, and cnt exceeds max_file_length. This could cause undefined behavior in the substr call on line 1137.
        
          
                src/targets/gpu/mlir.cpp
              
                Outdated
          
        
      | // mlir_x to avoid erroring out | ||
| fname = | ||
| cutoff > 0 ? sym_names.substr(0, cutoff) + shape_str + cnt + ext : "mlir" + cnt + ext; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a file with a name mlir0 or mlir1 is acceptable, then why not just dump every file with that consistent scheme: mlir_1, mlir_2? That was my point earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if you use this flag with a large model you might get 10s of mlir_ files, you would rather open every single one to see what they are about? I think if would be nice to see straight away see what kind of kernel each file is representing by a quick look at the name. As I mentioned in the other comment, I'm happy to do this if the mlir team would prefer this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a consistent naming scheme. The MLIR team is okay with either naming scheme, but we shouldn't have two naming schemes. You can either take the first approach, but you can truncate the long name to the appropriate length and add a numeral to it, if you want to see it straight away, a very personal approach that I find very subjective. So either please truncate the full name, or have a simple naming scheme. We should not add  confusion while debugging, IMHO!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(simple change suggested above).
        
          
                src/targets/gpu/mlir.cpp
              
                Outdated
          
        
      | // mlir_x to avoid erroring out | ||
| fname = | ||
| cutoff > 0 ? sym_names.substr(0, cutoff) + shape_str + cnt + ext : "mlir" + cnt + ext; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a consistent naming scheme. The MLIR team is okay with either naming scheme, but we shouldn't have two naming schemes. You can either take the first approach, but you can truncate the long name to the appropriate length and add a numeral to it, if you want to see it straight away, a very personal approach that I find very subjective. So either please truncate the full name, or have a simple naming scheme. We should not add  confusion while debugging, IMHO!
| Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@           Coverage Diff            @@
##           develop    #4197   +/-   ##
========================================
  Coverage    92.23%   92.23%           
========================================
  Files          553      553           
  Lines        25627    25628    +1     
========================================
+ Hits         23635    23636    +1     
  Misses        1992     1992           🚀 New features to boost your workflow:
 | 
        
          
                src/targets/gpu/mlir.cpp
              
                Outdated
          
        
      | } | ||
| fname += cnt + ext; | ||
|  | ||
| replace_string_inplace(fname, ", ", "_"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is shortening the file name by replacing 2 chars ,  with a single _, then this should be handled while accounting with new logic of name truncation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, totally missed the space after the comma
| {"transpose", "trp"}, | ||
| {"slice", "slc"}}; | ||
|  | ||
| for(auto const& [key, val] : abbrs) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bug lurking here, should the loop iterate over the string reshape before the longer string (for softmax) which also has the same substring inside it. It is just that the std::map would be in an order which works here. But change it to a another data structure, e.g. a hash, and the renaming scheme could be a surprise for us :-)
        
          
                src/targets/gpu/mlir.cpp
              
                Outdated
          
        
      | if(fname.length() + cnt.length() + ext.length() > max_file_length) | ||
| { | ||
| auto cutoff = max_file_length - ext.length() - cnt.length(); | ||
| fname = fname.substr(0, cutoff); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to agree with the copilot here :-)
correction: it is the cppcheck suggestion.
|  | ||
| // On most commonly used file systems, the max file name size is 255 characters | ||
| const int max_file_length = 255; | ||
| auto cnt = "_" + std::to_string(dump_counter()++); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This nitpicking likely shows my poor programming aesthetics :-)
But, does it need a struct dump_counter above? Or could just a single line defining a static variable with this name also suffice? This is fine, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this PR. (Thanks for addressing the file naming concerns.) But please see the issues that I have highlighted in my comments, and fix as you see fit @shivadbhavsar.
| 
 This build is not recommended to merge 🔴 | 
| ❌bert-mrpc-tf: ERROR - check error outputerror: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option] 2025-08-21 13:16:12.914340: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1755800178.078858 172401 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62951 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:32:00.0 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1755800179.017844 172401 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled 2025-08-21 13:16:27.978743: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-21 13:16:27.978788: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-21 13:16:27.978831: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-21 13:16:27.979042: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-21 13:16:27.979072: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-21 13:16:27.979116: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-21 13:16:27.979161: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-21 13:16:27.979203: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO 2025-08-21 13:16:27.980241: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed. 2025-08-21 13:16:27.981494: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed. 2025-08-21 13:16:27.981513: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] 2025-08-21 13:16:27.981524: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] [[import/loss/output/_21]] 2025-08-21 13:16:27.981539: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call return fn(*args) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn return self._call_tf_sessionrun(options, feed_dict, fetch_list, File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] [[import/loss/output/_21]] (1) UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] 0 successful operations. 0 derived errors ignored. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in main() File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 335, in main y_out = sess.run(y, feed_dict=tf_dict) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run result = self._run(None, fetches, feed_dict, options_ptr, File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run results = self._do_run(handle, final_targets, final_fetches, File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run return self._do_call(_run_fn, feeds, fetches, targets, options, File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter tensorflow.python.framework.errors_impl.UnknownError: Graph execution error: Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last): Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last): Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' 2 root error(s) found. (0) UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] [[import/loss/output/_21]] (1) UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] 0 successful operations. 0 derived errors ignored. Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference': 🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output | 
Currently when using the MIGRAPHX_MLIR_DUMP flag, compile will error out due to filename being too long in cases of large fusions. To avoid this this PR adds logic to: Abbreviate/shorten op names If 1. is not enough, name is truncated to make sure program execution does not error out Note: Most common filesystems have a 255 byte limit and so the number is hard coded. I was not able to find a reliable programmatic way to determine this number based on the user system, please comment if you know! Example shortening: Before: ./mlir_where_slice_reshape_transpose_slice_reshape_transpose_slice_reshape_transpose_dot_add_convert_mul_reshape_reduce_max_reshape_sub_exp_reshape_reduce_sum_reshape_div_convert_dot_transpose_reshape_32x16x512x64s524288x64x1024x1_32x16x64x512s524288x64x1x1024_32x16x512x512_32x16x512x64s524288x64x1024x1.mlir Now: ./mlir_where_slc_rsp_trp_slc_rsp_trp_slc_rsp_trp_dot_add_convert_mul_rsp_softmax_convert_dot_trp_rsp_32x16x512x64s524288x64x1024x1_32x16x64x512s524288x64x1x1024_32x16x512x512_32x16x512x64s524288x64x1024x1.mlir
Currently when using the
MIGRAPHX_MLIR_DUMPflag, compile will error out due to filename being too long in cases of large fusions.To avoid this this PR adds logic to:
Note:
Most common filesystems have a 255 byte limit and so the number is hard coded. I was not able to find a reliable programmatic way to determine this number based on the user system, please comment if you know!
Example shortening:
Before:
./mlir_where_slice_reshape_transpose_slice_reshape_transpose_slice_reshape_transpose_dot_add_convert_mul_reshape_reduce_max_reshape_sub_exp_reshape_reduce_sum_reshape_div_convert_dot_transpose_reshape_32x16x512x64s524288x64x1024x1_32x16x64x512s524288x64x1x1024_32x16x512x512_32x16x512x64s524288x64x1024x1.mlirNow:
./mlir_where_slc_rsp_trp_slc_rsp_trp_slc_rsp_trp_dot_add_cvt_mul_rsp_rmax_rsp_sub_exp_rsp_rsum_rsp_div_cvt_dot_trp_rsp_32x16x512x64s524288x64x1024x1_32x16x64x512s524288x64x1x1024_32x16x512x512_32x16x512x64s524288x64x1024x1.mlir./mlir_where_slc_rsp_trp_slc_rsp_trp_slc_rsp_trp_dot_add_convert_mul_rsp_softmax_convert_dot_trp_rsp_32x16x512x64s524288x64x1024x1_32x16x64x512s524288x64x1x1024_32x16x512x512_32x16x512x64s524288x64x1024x1.mlir