Skip to content

Commit 4276805

Browse files
committed
Results from GH actions on NVIDIA_RTX4090x1
1 parent 4a29057 commit 4276805

File tree

29 files changed

+928
-928
lines changed

29 files changed

+928
-928
lines changed

closed/GATEOverflow/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/TEST01/performance/run_1/mlperf_log_detail.txt

Lines changed: 88 additions & 88 deletions
Large diffs are not rendered by default.

closed/GATEOverflow/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/TEST01/performance/run_1/mlperf_log_summary.txt

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ MLPerf Results Summary
44
SUT name : Server_3DUNet
55
Scenario : Offline
66
Mode : PerformanceOnly
7-
Samples per second: 4.13119
7+
Samples per second: 4.13091
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
@@ -13,21 +13,21 @@ Result is : VALID
1313
================================================
1414
Additional Stats
1515
================================================
16-
Min latency (ns) : 298712163
17-
Max latency (ns) : 676560238746
18-
Mean latency (ns) : 338326669742
19-
50.00 percentile latency (ns) : 338139075542
20-
90.00 percentile latency (ns) : 609730279416
21-
95.00 percentile latency (ns) : 642730489228
22-
97.00 percentile latency (ns) : 656637634523
23-
99.00 percentile latency (ns) : 670082985816
24-
99.90 percentile latency (ns) : 676138334761
16+
Min latency (ns) : 298535141
17+
Max latency (ns) : 676605939537
18+
Mean latency (ns) : 338358768315
19+
50.00 percentile latency (ns) : 338208959742
20+
90.00 percentile latency (ns) : 609759218730
21+
95.00 percentile latency (ns) : 642763934616
22+
97.00 percentile latency (ns) : 656673216829
23+
99.00 percentile latency (ns) : 670124197203
24+
99.90 percentile latency (ns) : 676183478194
2525

2626
================================================
2727
Test Parameters Used
2828
================================================
29-
samples_per_query : 2753
30-
target_qps : 4.17174
29+
samples_per_query : 2756
30+
target_qps : 4.17593
3131
target_latency (ns): 0
3232
max_async_queries : 1
3333
min_duration (ms): 600000
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 4.13044
3-
test score = 4.13119
2+
reference score = 4.13458
3+
test score = 4.13091
44
TEST PASS

closed/GATEOverflow/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/singlestream/TEST01/performance/run_1/mlperf_log_detail.txt

Lines changed: 92 additions & 92 deletions
Large diffs are not rendered by default.

closed/GATEOverflow/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/singlestream/TEST01/performance/run_1/mlperf_log_summary.txt

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,38 +4,38 @@ MLPerf Results Summary
44
SUT name : Server_3DUNet
55
Scenario : SingleStream
66
Mode : PerformanceOnly
7-
90th percentile latency (ns) : 438111926
7+
90th percentile latency (ns) : 438341980
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
1111
Early stopping satisfied: Yes
1212
Early Stopping Result:
1313
* Processed at least 64 queries (5934).
1414
* Would discard 538 highest latency queries.
15-
* Early stopping 90th percentile estimate: 438304240
16-
* Early stopping 99th percentile estimate: 504327018
15+
* Early stopping 90th percentile estimate: 438520623
16+
* Early stopping 99th percentile estimate: 504614656
1717

1818
================================================
1919
Additional Stats
2020
================================================
2121
QPS w/ loadgen overhead : 4.43
2222
QPS w/o loadgen overhead : 4.43
2323

24-
Min latency (ns) : 28882761
25-
Max latency (ns) : 514368406
26-
Mean latency (ns) : 225482846
27-
50.00 percentile latency (ns) : 176110159
28-
90.00 percentile latency (ns) : 438111926
29-
95.00 percentile latency (ns) : 503573566
30-
97.00 percentile latency (ns) : 503841379
31-
99.00 percentile latency (ns) : 504231354
32-
99.90 percentile latency (ns) : 504874330
24+
Min latency (ns) : 28868490
25+
Max latency (ns) : 514085641
26+
Mean latency (ns) : 225585142
27+
50.00 percentile latency (ns) : 176219743
28+
90.00 percentile latency (ns) : 438341980
29+
95.00 percentile latency (ns) : 503845557
30+
97.00 percentile latency (ns) : 504166942
31+
99.00 percentile latency (ns) : 504519643
32+
99.90 percentile latency (ns) : 505430196
3333

3434
================================================
3535
Test Parameters Used
3636
================================================
3737
samples_per_query : 1
38-
target_qps : 4.93531
38+
target_qps : 4.93586
3939
target_latency (ns): 0
4040
max_async_queries : 1
4141
min_duration (ms): 600000
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 437936140
3-
test score = 438304240
2+
reference score = 438082736
3+
test score = 438520623
44
TEST PASS

closed/GATEOverflow/measurements/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Host platform
44

5-
* OS version: Linux-6.8.0-53-generic-x86_64-with-glibc2.29
5+
* OS version: Linux-6.8.0-60-generic-x86_64-with-glibc2.29
66
* CPU version: x86_64
77
* Python version: 3.8.10 (default, Feb 4 2025, 15:02:54)
88
[GCC 9.4.0]
@@ -17,7 +17,7 @@ pip install -U mlcflow
1717

1818
mlc rm cache -f
1919

20-
mlc pull repo mlcommons@mlperf-automations --checkout=06b95fa9f0b3e5cedf5295a7b630442b2f9ffac3
20+
mlc pull repo mlcommons@mlperf-automations --checkout=cc1d43d1d5eeebee7efa08a1aa0f1cc62fcb1560
2121

2222

2323
```
@@ -41,4 +41,4 @@ Model Precision: int8
4141
`DICE`: `0.86236`, Required accuracy for closed division `>= 0.86084`
4242

4343
### Performance Results
44-
`Samples per second`: `4.13044`
44+
`Samples per second`: `4.13458`

closed/GATEOverflow/measurements/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/accuracy_console.out

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
[2025-05-01 21:47:27,092 main.py:229 INFO] Detected system ID: KnownSystem.Nvidia_05f8e0d586fd
2-
[2025-05-01 21:47:27,367 harness.py:249 INFO] The harness will load 3 plugins: ['build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so', 'build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so', 'build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so']
3-
[2025-05-01 21:47:27,367 generate_conf_files.py:107 INFO] Generated measurements/ entries for Nvidia_05f8e0d586fd_TRT/3d-unet-99.9/Offline
4-
[2025-05-01 21:47:27,367 __init__.py:46 INFO] Running command: ./build/bin/harness_3dunet --plugins="build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so,build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so,build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so" --logfile_outdir="/mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/3d-unet-99.9/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=43 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/kits19/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_08fd7192/inference/mlperf.conf" --tensor_path="build/preprocessed_data/KiTS19/inference/int8" --use_graphs=false --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/c2cf6864065340ebbfa6602c226164b0.conf" --unet3d_sw_gaussian_patch_path="/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/preprocessed_data/KiTS19/etc/gaussian_patches.npy" --gpu_engines="./build/engines/Nvidia_05f8e0d586fd/3d-unet/Offline/3d-unet-Offline-gpu-b8-int8.custom_k_99_9_MaxP.plan" --max_dlas=0 --slice_overlap_patch_kernel_cg_impl=false --scenario Offline --model 3d-unet
5-
[2025-05-01 21:47:27,367 __init__.py:53 INFO] Overriding Environment
1+
[2025-06-01 06:45:54,205 main.py:229 INFO] Detected system ID: KnownSystem.Nvidia_0aef8e1065ca
2+
[2025-06-01 06:45:54,486 harness.py:249 INFO] The harness will load 3 plugins: ['build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so', 'build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so', 'build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so']
3+
[2025-06-01 06:45:54,486 generate_conf_files.py:107 INFO] Generated measurements/ entries for Nvidia_0aef8e1065ca_TRT/3d-unet-99.9/Offline
4+
[2025-06-01 06:45:54,486 __init__.py:46 INFO] Running command: ./build/bin/harness_3dunet --plugins="build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so,build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so,build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so" --logfile_outdir="/mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/3d-unet-99.9/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=43 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/kits19/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_08fd7192/inference/mlperf.conf" --tensor_path="build/preprocessed_data/KiTS19/inference/int8" --use_graphs=false --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/25ed99cceccc4ff6bda93b605eee4f85.conf" --unet3d_sw_gaussian_patch_path="/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/preprocessed_data/KiTS19/etc/gaussian_patches.npy" --gpu_engines="./build/engines/Nvidia_0aef8e1065ca/3d-unet/Offline/3d-unet-Offline-gpu-b8-int8.custom_k_99_9_MaxP.plan" --max_dlas=0 --slice_overlap_patch_kernel_cg_impl=false --scenario Offline --model 3d-unet
5+
[2025-06-01 06:45:54,486 __init__.py:53 INFO] Overriding Environment
66
benchmark : Benchmark.UNET3D
77
buffer_manager_thread_count : 0
88
data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/data
@@ -11,23 +11,23 @@ gpu_copy_streams : 1
1111
gpu_inference_streams : 1
1212
input_dtype : int8
1313
input_format : linear
14-
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_32fceb49/repo/closed/NVIDIA/build/logs/2025.05.01-21.47.26
14+
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_32fceb49/repo/closed/NVIDIA/build/logs/2025.06.01-06.45.53
1515
map_path : data_maps/kits19/val_map.txt
1616
mlperf_conf_path : /home/mlcuser/MLC/repos/local/cache/get-git-repo_08fd7192/inference/mlperf.conf
1717
offline_expected_qps : 0.0
1818
precision : int8
1919
preprocessed_data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/preprocessed_data
2020
scenario : Scenario.Offline
2121
slice_overlap_patch_kernel_cg_impl : False
22-
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='13th Gen Intel(R) Core(TM) i9-13900K', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=1): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=131.634496, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=131634496000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=None, system_id='Nvidia_05f8e0d586fd')
22+
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='13th Gen Intel(R) Core(TM) i9-13900K', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=1): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=131.63447200000002, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=131634472000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=None, system_id='Nvidia_0aef8e1065ca')
2323
tensor_path : build/preprocessed_data/KiTS19/inference/int8
2424
test_mode : AccuracyOnly
2525
unet3d_sw_gaussian_patch_path : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/preprocessed_data/KiTS19/etc/gaussian_patches.npy
2626
use_deque_limit : True
2727
use_graphs : False
28-
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/c2cf6864065340ebbfa6602c226164b0.conf
29-
system_id : Nvidia_05f8e0d586fd
30-
config_name : Nvidia_05f8e0d586fd_3d-unet_Offline
28+
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/25ed99cceccc4ff6bda93b605eee4f85.conf
29+
system_id : Nvidia_0aef8e1065ca
30+
config_name : Nvidia_0aef8e1065ca_3d-unet_Offline
3131
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99_9, PowerSetting.MaxP)
3232
optimization_level : plugin-enabled
3333
num_profiles : 1
@@ -39,23 +39,23 @@ power_limit : None
3939
cpu_freq : None
4040
&&&& RUNNING MLPerf_Inference_3DUNet_Harness # ./build/bin/harness_3dunet
4141
[I] mlperf.conf path: /home/mlcuser/MLC/repos/local/cache/get-git-repo_08fd7192/inference/mlperf.conf
42-
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/c2cf6864065340ebbfa6602c226164b0.conf
42+
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/25ed99cceccc4ff6bda93b605eee4f85.conf
4343
Creating QSL.
4444
Finished Creating QSL.
4545
Setting up SUT.
4646
[I] [TRT] Loaded engine size: 31 MiB
47-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 79, GPU 1097 (MiB)
48-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 79, GPU 1107 (MiB)
47+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 80, GPU 1097 (MiB)
48+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 80, GPU 1107 (MiB)
4949
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +29, now: CPU 0, GPU 29 (MiB)
50-
[I] Device:0: ./build/engines/Nvidia_05f8e0d586fd/3d-unet/Offline/3d-unet-Offline-gpu-b8-int8.custom_k_99_9_MaxP.plan has been successfully loaded.
50+
[I] Device:0: ./build/engines/Nvidia_0aef8e1065ca/3d-unet/Offline/3d-unet-Offline-gpu-b8-int8.custom_k_99_9_MaxP.plan has been successfully loaded.
5151
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
52-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 48, GPU 1813 (MiB)
53-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 48, GPU 1821 (MiB)
52+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 49, GPU 1813 (MiB)
53+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 49, GPU 1821 (MiB)
5454
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2218, now: CPU 0, GPU 2247 (MiB)
5555
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: true
5656
Finished setting up SUT.
5757
Starting warmup. Running for a minimum of 5 seconds.
58-
Finished warmup. Ran for 5.44798s.
58+
Finished warmup. Ran for 5.46776s.
5959
Starting running actual test.
6060

6161
No warnings encountered during test.
@@ -72,8 +72,8 @@ Device Device:0 processed:
7272
PerSampleCudaMemcpy Calls: 43
7373
BatchedCudaMemcpy Calls: 0
7474
&&&& PASSED MLPerf_Inference_3DUNet_Harness # ./build/bin/harness_3dunet
75-
[2025-05-01 21:47:44,594 run_harness.py:166 INFO] Result: Accuracy run detected.
76-
[2025-05-01 21:47:44,594 __init__.py:46 INFO] Running command: python3 code/3d-unet/tensorrt/accuracy_kits.py --log_file /mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/3d-unet-99.9/offline/accuracy/mlperf_log_accuracy.json
75+
[2025-06-01 06:46:11,679 run_harness.py:166 INFO] Result: Accuracy run detected.
76+
[2025-06-01 06:46:11,679 __init__.py:46 INFO] Running command: python3 code/3d-unet/tensorrt/accuracy_kits.py --log_file /mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/3d-unet-99.9/offline/accuracy/mlperf_log_accuracy.json
7777
Loading necessary metadata...
7878
Loading loadgen accuracy log...
7979
Running postprocessing...

closed/GATEOverflow/measurements/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/cpu_info.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"MLC_HOST_CPU_WRITE_PROTECT_SUPPORT": "yes",
3-
"MLC_HOST_CPU_MICROCODE": "0x12b",
3+
"MLC_HOST_CPU_MICROCODE": "0x12c",
44
"MLC_HOST_CPU_FPU_SUPPORT": "yes",
55
"MLC_HOST_CPU_FPU_EXCEPTION_SUPPORT": "yes",
66
"MLC_HOST_CPU_BUGS": "spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb rfds bhi",

0 commit comments

Comments
 (0)