Byte class and Refactor to/from GPU #4201

CharlieL7 · 2025-08-04T22:17:03Z

Part of https://github.com/ROCm/AMDMIGraphX-internal/issues/149
Adds a migraphx::byte class for handling a generic byte of data
- Type prevents computation such as addition or increment
Refactors to_gpu and from_gpu to use raw buffers
- A side effect of making it work for non-computable types like fp4x2_type
raw_data.fallback_visit() for handling checking of a computable type during visit
- doesn't visit for a non-computable type and instead just makes a tensor_view<migraphx::byte>

…_hip_data

codecov · 2025-08-04T23:40:44Z

Codecov Report

❌ Patch coverage is 90.24390% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/include/migraphx/raw_data.hpp	90.91%	2 Missing ⚠️
src/shape.cpp	66.67%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4201      +/-   ##
===========================================
- Coverage    92.23%   92.22%   -0.01%     
===========================================
  Files          553      556       +3     
  Lines        25628    25697      +69     
===========================================
+ Hits         23636    23697      +61     
- Misses        1992     2000       +8

Files with missing lines	Coverage Δ
src/generate.cpp	`93.62% <100.00%> (+0.93%)`	⬆️
src/include/migraphx/byte.hpp	`100.00% <100.00%> (ø)`
src/include/migraphx/shape.hpp	`88.89% <ø> (ø)`
src/include/migraphx/tensor_view.hpp	`100.00% <ø> (ø)`
src/include/migraphx/raw_data.hpp	`96.67% <90.91%> (-0.63%)`	⬇️
src/shape.cpp	`91.90% <66.67%> (-0.16%)`	⬇️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/include/migraphx/byte.hpp

…o byte_class_hip_data

src/include/migraphx/raw_data.hpp

…/AMDMIGraphX into byte_class_hip_data

shivadbhavsar

lgtm

src/include/migraphx/raw_data.hpp

lakhinderwalia · 2025-08-18T22:48:53Z

src/include/migraphx/byte.hpp

+ * Created to have a custom stream operator so that it prints as an unsigned int.
+ * This type is essentially a limited unsigned_char to prevent things like trying to add two bytes.
+ */
+enum class byte : unsigned char


All your casts are of type uint8_t. However, this enum takes (its base) after unsigned char.

Matching the standard library proposed implementation.

src/include/migraphx/byte.hpp

src/shape.cpp

lakhinderwalia · 2025-08-18T23:57:53Z

src/include/migraphx/byte.hpp

+          MIGRAPHX_REQUIRES(std::is_integral<IntType>{} and std::is_unsigned<IntType>{})>
+constexpr byte operator<<(byte b, IntType shift) noexcept
+{
+    return static_cast<byte>(static_cast<uint8_t>(b) << shift);


There perhaps should be no reason to cast if the original base of this enum were uint8_t instead of unsigned char. All these static casts could just go away, I wonder.

src/generate.cpp

lakhinderwalia

Approving this PR.

As a style issue, I prefer not having two casts, when one cast could do, but that shouldn't be a stopper in any way for your excellent work done already. Thanks.

… into byte_class_hip_data

migraphx-bot · 2025-08-22T20:57:34Z

Test	Batch	Rate new f74172	Rate old 7a7354	Diff	Compare
torchvision-resnet50	64	3,246.13	3,229.52	0.51%	✅
torchvision-resnet50_fp16	64	6,958.31	6,923.81	0.50%	✅
torchvision-densenet121	32	2,450.17	2,439.36	0.44%	✅
torchvision-densenet121_fp16	32	4,165.23	4,153.21	0.29%	✅
torchvision-inceptionv3	32	1,634.80	1,627.81	0.43%	✅
torchvision-inceptionv3_fp16	32	2,752.74	2,742.45	0.38%	✅
cadene-inceptionv4	16	771.04	767.53	0.46%	✅
cadene-resnext64x4	16	813.64	808.53	0.63%	✅
slim-mobilenet	64	7,458.00	7,423.40	0.47%	✅
slim-nasnetalarge	64	211.05	209.98	0.51%	✅
slim-resnet50v2	64	3,342.72	3,328.19	0.44%	✅
bert-mrpc-onnx	8	1,145.15	1,136.58	0.75%	✅
bert-mrpc-tf	1	444.64	442.43	0.50%	✅
pytorch-examples-wlang-gru	1	300.61	289.61	3.80%	🔆
pytorch-examples-wlang-lstm	1	415.05	403.60	2.84%	✅
torchvision-resnet50_1	1	770.80	768.91	0.25%	✅
cadene-dpn92_1	1	391.63	391.63	0.00%	✅
cadene-resnext101_1	1	393.52	390.78	0.70%	✅
onnx-taau-downsample	1	395.79	394.27	0.39%	✅
dlrm-criteoterabyte	1	33.75	33.22	1.61%	✅
dlrm-criteoterabyte_fp16	1	51.21	51.07	0.27%	✅
agentmodel	1	8,347.52	8,866.82	-5.86%	🔴
unet_fp16	2	59.16	58.96	0.34%	✅
resnet50v1_fp16	1	976.71	983.06	-0.65%	✅
resnet50v1_int8	1	1,024.65	1,037.34	-1.22%	✅
bert_base_cased_fp16	64	1,107.18	1,100.24	0.63%	✅
bert_large_uncased_fp16	32	345.22	343.67	0.45%	✅
bert_large_fp16	1	197.85	197.39	0.23%	✅
distilgpt2_fp16	16	2,117.63	2,105.86	0.56%	✅
yolov5s	1	575.69	578.10	-0.42%	✅
tinyllama	1	43.94	43.74	0.44%	✅
vicuna-fastchat	1	45.25	45.02	0.52%	✅
whisper-tiny-encoder	1	417.68	416.34	0.32%	✅
whisper-tiny-decoder	1	400.50	408.06	-1.85%	✅
llama2_7b	1	19.17	19.11	0.29%	✅
qwen1.5-7b	1	23.53	23.42	0.49%	✅
phi3-3.8b	1	26.68	26.60	0.31%	✅
mask-rcnn	1	12.51	12.49	0.17%	✅
llama3-8b	1	21.75	21.65	0.47%	✅
whisper-large-encoder	1	10.22	10.18	0.37%	✅
whisper-large-decoder	1	96.86	96.79	0.07%	✅
mistral-7b	1	23.71	23.61	0.42%	✅
FLUX.1-schnell	1	743.94	748.70	-0.64%	✅
nan	nan	nan	nan	nan%	❌

This build is not recommended to merge 🔴

migraphx-bot · 2025-08-22T20:57:37Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

❌bert-mrpc-tf: ERROR - check error output

error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

2025-08-22 13:49:57.855983: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1755888603.279076 173624 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62951 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:b3:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1755888604.186091 173624 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-08-22 13:50:12.894353: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894515: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894563: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894608: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894655: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894702: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894732: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-22 13:50:12.894760: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-08-22 13:50:12.895985: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-08-22 13:50:12.897281: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-08-22 13:50:12.897301: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-08-22 13:50:12.897313: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-08-22 13:50:12.897328: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 335, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference':

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

✅ llama2_7b: PASSED: MIGraphX meets tolerance

✅ qwen1.5-7b: PASSED: MIGraphX meets tolerance

✅ phi3-3.8b: PASSED: MIGraphX meets tolerance

🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

✅ llama3-8b: PASSED: MIGraphX meets tolerance

✅ whisper-large-decoder: PASSED: MIGraphX meets tolerance

✅ mistral-7b: PASSED: MIGraphX meets tolerance

✅ FLUX.1-schnell: PASSED: MIGraphX meets tolerance

CharlieL7 added 5 commits August 4, 2025 16:58

initial

b0c0bcf

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into byte_class…

50de568

…_hip_data

compiler throw on no derived impl

8e8ecfd

Update comment for accuracy

cff031a

Fix jit pad typo

8a76831

CharlieL7 added 2 commits August 4, 2025 19:12

Edit tests from copilot

75e4178

Fix assignment bug

acad718

pfultz2 reviewed Aug 5, 2025

View reviewed changes

src/include/migraphx/byte.hpp Outdated Show resolved Hide resolved

pfultz2 reviewed Aug 5, 2025

View reviewed changes

src/include/migraphx/byte.hpp Show resolved Hide resolved

tidy fixes, style

d6bf1fc

CharlieL7 changed the title ~~Byte class hip data~~ Byte class and Refactor to/from GPU Aug 5, 2025

CharlieL7 self-assigned this Aug 5, 2025

CharlieL7 added 3 commits August 5, 2025 01:12

Add tests

f72541b

Add tests

ad98136

Merge branch 'byte_class_hip_data' of github.com:ROCm/AMDMIGraphX int…

ae974b7

…o byte_class_hip_data

CharlieL7 marked this pull request as ready for review August 5, 2025 06:34

CharlieL7 requested a review from causten as a code owner August 5, 2025 06:34

CharlieL7 requested a review from shivadbhavsar August 5, 2025 16:17

CharlieL7 added 2 commits August 5, 2025 13:11

Fix bug in generate for non-computable

ba9c0c3

tidy & license

377d707

CharlieL7 requested a review from pfultz2 August 6, 2025 18:52

pfultz2 reviewed Aug 7, 2025

View reviewed changes

src/include/migraphx/raw_data.hpp Outdated Show resolved Hide resolved

CharlieL7 added 3 commits August 7, 2025 12:15

Update raw_visit comment

66b245f

Merge branch 'byte_class_hip_data' of github.com:ROCmSoftwarePlatform…

f106a1b

…/AMDMIGraphX into byte_class_hip_data

remove double negation test

ba0e4c0

CharlieL7 requested a review from pfultz2 August 8, 2025 15:36

shivadbhavsar approved these changes Aug 11, 2025

View reviewed changes

pfultz2 reviewed Aug 12, 2025

View reviewed changes

src/include/migraphx/raw_data.hpp Outdated Show resolved Hide resolved

pfultz2 reviewed Aug 12, 2025

View reviewed changes

src/include/migraphx/raw_data.hpp Show resolved Hide resolved

CharlieL7 mentioned this pull request Aug 18, 2025

MXFP4 GPU pack and unpack #4181

Merged

CharlieL7 added the MXFP4 label Aug 18, 2025

lakhinderwalia reviewed Aug 18, 2025

View reviewed changes

lakhinderwalia reviewed Aug 19, 2025

View reviewed changes

src/generate.cpp Show resolved Hide resolved

CharlieL7 added 2 commits August 20, 2025 16:04

Rename raw_visit to fallback_visit, simplify

33b0a98

use if constexpr instead of template function specialization

967433c

pfultz2 approved these changes Aug 20, 2025

View reviewed changes

lakhinderwalia approved these changes Aug 21, 2025

View reviewed changes

CharlieL7 added 3 commits August 22, 2025 11:13

Formatting

206300f

Merge branch 'develop' of github.com:ROCmSoftwarePlatform/AMDMIGraphX…

3af26d9

… into byte_class_hip_data

Merge branch 'develop' into byte_class_hip_data

f741727

CharlieL7 mentioned this pull request Aug 22, 2025

Change unpack_fp4 output to fp8e4m3fn type #4255

Merged

causten merged commit b4e9a34 into develop Aug 26, 2025
49 of 52 checks passed

causten deleted the byte_class_hip_data branch August 26, 2025 15:54

CharlieL7 mentioned this pull request Aug 26, 2025

MXFP4 fuse_mlir changes #4246

Merged

Byte class and Refactor to/from GPU #4201

Byte class and Refactor to/from GPU #4201

Uh oh!

Conversation

CharlieL7 commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shivadbhavsar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lakhinderwalia Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

CharlieL7 Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lakhinderwalia Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lakhinderwalia left a comment

Choose a reason for hiding this comment

Uh oh!

migraphx-bot commented Aug 22, 2025

Uh oh!

migraphx-bot commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

CharlieL7 commented Aug 4, 2025 •

edited

Loading

codecov bot commented Aug 4, 2025 •

edited

Loading