- 
                Notifications
    You must be signed in to change notification settings 
- Fork 112
Handle compiler warnings for rocm7.0 #4192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses compiler warnings encountered when building with ROCm 7.0, specifically handling two types of warnings that prevent clean builds.
- Disables contradictory switch-default warnings in Clang builds where -Weverythingcreates conflicting requirements
- Suppresses deprecated declaration warnings for get_temporary_bufferin specific header includes until GCC is upgraded
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description | 
|---|---|
| cmake/EnableCompilerWarnings.cmake | Adds -Wno-switch-defaultflag to disable contradictory switch warnings in Clang builds | 
| src/include/migraphx/shape.hpp | Wraps memory header include with pragma to suppress deprecated declarations warning | 
| src/include/migraphx/algorithm.hpp | Wraps algorithm header include with pragma to suppress deprecated declarations warning | 
| 
 Is there a larger backtrace for this? I am curious why this get instantiated with our  | 
| 
 @pfultz2 Yeah, below is the trace for   | 
| Just to document some of my findings, which you probably saw too. So this is a bug in clang(that hasnt been fixed) when using  Which is caused by this commit in clang: We could disable the warning globally but conditionally check for this case since this is only a problem with gcc 12, which is what was done in the ceph project: https://github.com/ceph/ceph/pull/62622/files I dont think we need to do the push/pop stuff in cmake, we can just use the  set(CMAKE_REQUIRED_FLAGS "-Werror=deprecated-declarations")
check_cxx_source_compiles("
#include <algorithm>
int main() { std::stable_sort((int *)0, (int*)0); }
" COMPILER_IGNORES_DEPRECATED_DECL_IN_SYSTEM_HEADERS) | 
| 
 @pfultz2 I just added a commit that does essentially this, but instead of conditionally turning off the warning, it conditionally specifies to clang where the gcc lib dir is. What are your thoughts? I think turning off the warning entirely is pretty broad, and might hide some potential warnings from us that we may want to see. This way, it specifies exactly what lib to use if the warning exists and libstd11 exists. fyi not moving this from draft just yet because there are a ton of rocmlir warnings now. I think they might need to make some changes on their end but I need to make sure. | 
| Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@            Coverage Diff             @@
##             develop    #4192   +/-   ##
==========================================
  Coverage           ?   92.23%           
==========================================
  Files              ?      553           
  Lines              ?    25628           
  Branches           ?        0           
==========================================
  Hits               ?    23636           
  Misses             ?     1992           
  Partials           ?        0           🚀 New features to boost your workflow:
 | 
d35165a    to
    e5ecc7e      
    Compare
  
    | 
 This build is not recommended to merge 🔴 | 
| ❌bert-mrpc-tf: ERROR - check error outputerror: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option] 2025-08-08 05:05:45.833059: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1754647551.046720 168411 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62951 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:32:00.0 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1754647551.908417 168411 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled 2025-08-08 05:06:00.435415: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-08 05:06:00.435588: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-08 05:06:00.435633: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-08 05:06:00.435674: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-08 05:06:00.435703: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-08 05:06:00.435749: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-08 05:06:00.435792: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc 2025-08-08 05:06:00.435838: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO error: Failure when generating HSACO 2025-08-08 05:06:00.436910: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed. 2025-08-08 05:06:00.438002: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed. 2025-08-08 05:06:00.438023: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] 2025-08-08 05:06:00.438036: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] [[import/loss/output/_21]] 2025-08-08 05:06:00.438053: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call return fn(*args) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn return self._call_tf_sessionrun(options, feed_dict, fetch_list, File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] [[import/loss/output/_21]] (1) UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] 0 successful operations. 0 derived errors ignored. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in main() File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 335, in main y_out = sess.run(y, feed_dict=tf_dict) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run result = self._run(None, fetches, feed_dict, options_ptr, File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run results = self._do_run(handle, final_targets, final_fetches, File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run return self._do_call(_run_fn, feeds, fetches, targets, options, File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter tensorflow.python.framework.errors_impl.UnknownError: Graph execution error: Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last): Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last): Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' 2 root error(s) found. (0) UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] [[import/loss/output/_21]] (1) UNKNOWN: JIT compilation failed. [[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]] 0 successful operations. 0 derived errors ignored. Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference': 🔴unet: FAILED: MIGraphX is not within tolerance - check verbose output🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output | 
b573411    to
    334827c      
    Compare
  
    | @pfultz2 I removed the CXX flags line as requested, ready to be merged now | 
resolves #4193
Updates:
There are two additional warnings, but they are coming from external projects, rocMLIR and the upstream libs of cget. These aren't really solvable, but I have an issue open on rocMLIR right now. If I can suppress the cget warnings, I'll make an additional PR.Ethan solved the msgpack one. As for rocMLIR, I have added an ignore deprecated declaration, and it is on the rocMLIR project to handle the warning.Original:
Two main warnings currently:
/workspace/AMDMIGraphX/src/include/migraphx/value.hpp:354:9: warning: 'switch' missing 'default' label [-Wswitch-default]Default label in switch which covers all enumeration values. We haveWeverythingenabled for clang builds, which allows for contradictory warnings. This is a known issue and is recommended to useWall+Wextrainstead. However, just disabling one of the contradicting flags is a little more narrow in scope, and already has precedence in migraphx. This PR disables the default label warning similarly./usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_tempbuf.h:263:8: warning: 'get_temporary_buffer<migraphx::builtin::param>' is deprecated [-Wdeprecated-declarations]