Skip to content

Conversation

@CharlieL7
Copy link
Collaborator

@CharlieL7 CharlieL7 commented Aug 4, 2025

  • Part of https://github.com/ROCm/AMDMIGraphX-internal/issues/149
  • Adds a migraphx::byte class for handling a generic byte of data
    • Type prevents computation such as addition or increment
  • Refactors to_gpu and from_gpu to use raw buffers
    • A side effect of making it work for non-computable types like fp4x2_type
  • raw_data.fallback_visit() for handling checking of a computable type during visit
    • doesn't visit for a non-computable type and instead just makes a tensor_view<migraphx::byte>

@codecov
Copy link

codecov bot commented Aug 4, 2025

Codecov Report

❌ Patch coverage is 90.24390% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/include/migraphx/raw_data.hpp 90.91% 2 Missing ⚠️
src/shape.cpp 66.67% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4201      +/-   ##
===========================================
- Coverage    92.23%   92.22%   -0.01%     
===========================================
  Files          553      556       +3     
  Lines        25628    25697      +69     
===========================================
+ Hits         23636    23697      +61     
- Misses        1992     2000       +8     
Files with missing lines Coverage Δ
src/generate.cpp 93.62% <100.00%> (+0.93%) ⬆️
src/include/migraphx/byte.hpp 100.00% <100.00%> (ø)
src/include/migraphx/shape.hpp 88.89% <ø> (ø)
src/include/migraphx/tensor_view.hpp 100.00% <ø> (ø)
src/include/migraphx/raw_data.hpp 96.67% <90.91%> (-0.63%) ⬇️
src/shape.cpp 91.90% <66.67%> (-0.16%) ⬇️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@CharlieL7 CharlieL7 changed the title Byte class hip data Byte class and Refactor to/from GPU Aug 5, 2025
@CharlieL7 CharlieL7 self-assigned this Aug 5, 2025
@CharlieL7 CharlieL7 marked this pull request as ready for review August 5, 2025 06:34
@CharlieL7 CharlieL7 requested a review from causten as a code owner August 5, 2025 06:34
@CharlieL7 CharlieL7 requested a review from shivadbhavsar August 5, 2025 16:17
@CharlieL7 CharlieL7 requested a review from pfultz2 August 6, 2025 18:52
@CharlieL7 CharlieL7 requested a review from pfultz2 August 8, 2025 15:36
Copy link
Contributor

@shivadbhavsar shivadbhavsar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

* Created to have a custom stream operator so that it prints as an unsigned int.
* This type is essentially a limited unsigned_char to prevent things like trying to add two bytes.
*/
enum class byte : unsigned char
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All your casts are of type uint8_t. However, this enum takes (its base) after unsigned char.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching the standard library proposed implementation.

MIGRAPHX_REQUIRES(std::is_integral<IntType>{} and std::is_unsigned<IntType>{})>
constexpr byte operator<<(byte b, IntType shift) noexcept
{
return static_cast<byte>(static_cast<uint8_t>(b) << shift);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There perhaps should be no reason to cast if the original base of this enum were uint8_t instead of unsigned char. All these static casts could just go away, I wonder.

Copy link
Contributor

@lakhinderwalia lakhinderwalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR.

As a style issue, I prefer not having two casts, when one cast could do, but that shouldn't be a stopper in any way for your excellent work done already. Thanks.

@migraphx-bot
Copy link
Collaborator

Test Batch Rate new
f74172
Rate old
7a7354
Diff Compare
torchvision-resnet50 64 3,246.13 3,229.52 0.51%
torchvision-resnet50_fp16 64 6,958.31 6,923.81 0.50%
torchvision-densenet121 32 2,450.17 2,439.36 0.44%
torchvision-densenet121_fp16 32 4,165.23 4,153.21 0.29%
torchvision-inceptionv3 32 1,634.80 1,627.81 0.43%
torchvision-inceptionv3_fp16 32 2,752.74 2,742.45 0.38%
cadene-inceptionv4 16 771.04 767.53 0.46%
cadene-resnext64x4 16 813.64 808.53 0.63%
slim-mobilenet 64 7,458.00 7,423.40 0.47%
slim-nasnetalarge 64 211.05 209.98 0.51%
slim-resnet50v2 64 3,342.72 3,328.19 0.44%
bert-mrpc-onnx 8 1,145.15 1,136.58 0.75%
bert-mrpc-tf 1 444.64 442.43 0.50%
pytorch-examples-wlang-gru 1 300.61 289.61 3.80% 🔆
pytorch-examples-wlang-lstm 1 415.05 403.60 2.84%
torchvision-resnet50_1 1 770.80 768.91 0.25%
cadene-dpn92_1 1 391.63 391.63 0.00%
cadene-resnext101_1 1 393.52 390.78 0.70%
onnx-taau-downsample 1 395.79 394.27 0.39%
dlrm-criteoterabyte 1 33.75 33.22 1.61%
dlrm-criteoterabyte_fp16 1 51.21 51.07 0.27%
agentmodel 1 8,347.52 8,866.82 -5.86% 🔴
unet_fp16 2 59.16 58.96 0.34%
resnet50v1_fp16 1 976.71 983.06 -0.65%
resnet50v1_int8 1 1,024.65 1,037.34 -1.22%
bert_base_cased_fp16 64 1,107.18 1,100.24 0.63%
bert_large_uncased_fp16 32 345.22 343.67 0.45%
bert_large_fp16 1 197.85 197.39 0.23%
distilgpt2_fp16 16 2,117.63 2,105.86 0.56%
yolov5s 1 575.69 578.10 -0.42%
tinyllama 1 43.94 43.74 0.44%
vicuna-fastchat 1 45.25 45.02 0.52%
whisper-tiny-encoder 1 417.68 416.34 0.32%
whisper-tiny-decoder 1 400.50 408.06 -1.85%
llama2_7b 1 19.17 19.11 0.29%
qwen1.5-7b 1 23.53 23.42 0.49%
phi3-3.8b 1 26.68 26.60 0.31%
mask-rcnn 1 12.51 12.49 0.17%
llama3-8b 1 21.75 21.65 0.47%
whisper-large-encoder 1 10.22 10.18 0.37%
whisper-large-decoder 1 96.86 96.79 0.07%
mistral-7b 1 23.71 23.61 0.42%
FLUX.1-schnell 1 743.94 748.70 -0.64%
nan nan nan nan nan%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

❌bert-mrpc-tf: ERROR - check error outputerror: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

2025-08-22 13:49:57.855983: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1755888603.279076 173624 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62951 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:b3:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1755888604.186091 173624 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-08-22 13:50:12.894353: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894515: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894563: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894608: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894655: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894702: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894732: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894760: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-08-22 13:50:12.895985: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-08-22 13:50:12.897281: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-08-22 13:50:12.897301: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-08-22 13:50:12.897313: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-08-22 13:50:12.897328: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 335, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference':


     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

     ✅ llama2_7b: PASSED: MIGraphX meets tolerance

     ✅ qwen1.5-7b: PASSED: MIGraphX meets tolerance

     ✅ phi3-3.8b: PASSED: MIGraphX meets tolerance

🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ llama3-8b: PASSED: MIGraphX meets tolerance

     ✅ whisper-large-decoder: PASSED: MIGraphX meets tolerance

     ✅ mistral-7b: PASSED: MIGraphX meets tolerance

     ✅ FLUX.1-schnell: PASSED: MIGraphX meets tolerance

@causten causten merged commit b4e9a34 into develop Aug 26, 2025
49 of 52 checks passed
@causten causten deleted the byte_class_hip_data branch August 26, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants