Skip to content

Commit bfb6b5a

Browse files
committed
Results from GH actions on NVIDIA_RTX4090x1
1 parent b25edf5 commit bfb6b5a

File tree

29 files changed

+1094
-1092
lines changed

29 files changed

+1094
-1092
lines changed

closed/GATEOverflow/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/TEST01/performance/run_1/mlperf_log_detail.txt

Lines changed: 88 additions & 88 deletions
Large diffs are not rendered by default.

closed/GATEOverflow/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/TEST01/performance/run_1/mlperf_log_summary.txt

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ MLPerf Results Summary
44
SUT name : Server_3DUNet
55
Scenario : Offline
66
Mode : PerformanceOnly
7-
Samples per second: 4.13091
7+
Samples per second: 4.13288
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
@@ -13,21 +13,21 @@ Result is : VALID
1313
================================================
1414
Additional Stats
1515
================================================
16-
Min latency (ns) : 298535141
17-
Max latency (ns) : 676605939537
18-
Mean latency (ns) : 338358768315
19-
50.00 percentile latency (ns) : 338208959742
20-
90.00 percentile latency (ns) : 609759218730
21-
95.00 percentile latency (ns) : 642763934616
22-
97.00 percentile latency (ns) : 656673216829
23-
99.00 percentile latency (ns) : 670124197203
24-
99.90 percentile latency (ns) : 676183478194
16+
Min latency (ns) : 298512052
17+
Max latency (ns) : 676283956742
18+
Mean latency (ns) : 338172778485
19+
50.00 percentile latency (ns) : 338023369252
20+
90.00 percentile latency (ns) : 609452975214
21+
95.00 percentile latency (ns) : 642449807518
22+
97.00 percentile latency (ns) : 656357027727
23+
99.00 percentile latency (ns) : 669804491385
24+
99.90 percentile latency (ns) : 675861717819
2525

2626
================================================
2727
Test Parameters Used
2828
================================================
29-
samples_per_query : 2756
30-
target_qps : 4.17593
29+
samples_per_query : 2757
30+
target_qps : 4.17741
3131
target_latency (ns): 0
3232
max_async_queries : 1
3333
min_duration (ms): 600000
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 4.13458
3-
test score = 4.13091
2+
Reference score = 4.13605
3+
Test score = 4.13288
44
TEST PASS

closed/GATEOverflow/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/singlestream/TEST01/performance/run_1/mlperf_log_detail.txt

Lines changed: 92 additions & 92 deletions
Large diffs are not rendered by default.

closed/GATEOverflow/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/singlestream/TEST01/performance/run_1/mlperf_log_summary.txt

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,38 +4,38 @@ MLPerf Results Summary
44
SUT name : Server_3DUNet
55
Scenario : SingleStream
66
Mode : PerformanceOnly
7-
90th percentile latency (ns) : 438341980
7+
90th percentile latency (ns) : 438342403
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
1111
Early stopping satisfied: Yes
1212
Early Stopping Result:
1313
* Processed at least 64 queries (5934).
1414
* Would discard 538 highest latency queries.
15-
* Early stopping 90th percentile estimate: 438520623
16-
* Early stopping 99th percentile estimate: 504614656
15+
* Early stopping 90th percentile estimate: 438558932
16+
* Early stopping 99th percentile estimate: 504572064
1717

1818
================================================
1919
Additional Stats
2020
================================================
2121
QPS w/ loadgen overhead : 4.43
2222
QPS w/o loadgen overhead : 4.43
2323

24-
Min latency (ns) : 28868490
25-
Max latency (ns) : 514085641
26-
Mean latency (ns) : 225585142
27-
50.00 percentile latency (ns) : 176219743
28-
90.00 percentile latency (ns) : 438341980
29-
95.00 percentile latency (ns) : 503845557
30-
97.00 percentile latency (ns) : 504166942
31-
99.00 percentile latency (ns) : 504519643
32-
99.90 percentile latency (ns) : 505430196
24+
Min latency (ns) : 28864188
25+
Max latency (ns) : 514333133
26+
Mean latency (ns) : 225589431
27+
50.00 percentile latency (ns) : 176185278
28+
90.00 percentile latency (ns) : 438342403
29+
95.00 percentile latency (ns) : 503793880
30+
97.00 percentile latency (ns) : 504105976
31+
99.00 percentile latency (ns) : 504464399
32+
99.90 percentile latency (ns) : 505481682
3333

3434
================================================
3535
Test Parameters Used
3636
================================================
3737
samples_per_query : 1
38-
target_qps : 4.93586
38+
target_qps : 4.93491
3939
target_latency (ns): 0
4040
max_async_queries : 1
4141
min_duration (ms): 600000
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 438082736
3-
test score = 438520623
2+
Reference score = 438151242.0
3+
Test score = 438558932.0
44
TEST PASS

closed/GATEOverflow/measurements/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ pip install -U mlcflow
1717

1818
mlc rm cache -f
1919

20-
mlc pull repo mlcommons@mlperf-automations --checkout=cc1d43d1d5eeebee7efa08a1aa0f1cc62fcb1560
20+
mlc pull repo mlcommons@mlperf-automations --checkout=08dcb1037030ce0a1305dd02e8743652d1b146d6
2121

2222

2323
```
@@ -41,4 +41,4 @@ Model Precision: int8
4141
`DICE`: `0.86236`, Required accuracy for closed division `>= 0.86084`
4242

4343
### Performance Results
44-
`Samples per second`: `4.13458`
44+
`Samples per second`: `4.13605`

closed/GATEOverflow/measurements/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/accuracy_console.out

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
[2025-06-01 06:45:54,205 main.py:229 INFO] Detected system ID: KnownSystem.Nvidia_0aef8e1065ca
2-
[2025-06-01 06:45:54,486 harness.py:249 INFO] The harness will load 3 plugins: ['build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so', 'build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so', 'build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so']
3-
[2025-06-01 06:45:54,486 generate_conf_files.py:107 INFO] Generated measurements/ entries for Nvidia_0aef8e1065ca_TRT/3d-unet-99.9/Offline
4-
[2025-06-01 06:45:54,486 __init__.py:46 INFO] Running command: ./build/bin/harness_3dunet --plugins="build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so,build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so,build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so" --logfile_outdir="/mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/3d-unet-99.9/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=43 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/kits19/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_08fd7192/inference/mlperf.conf" --tensor_path="build/preprocessed_data/KiTS19/inference/int8" --use_graphs=false --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/25ed99cceccc4ff6bda93b605eee4f85.conf" --unet3d_sw_gaussian_patch_path="/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/preprocessed_data/KiTS19/etc/gaussian_patches.npy" --gpu_engines="./build/engines/Nvidia_0aef8e1065ca/3d-unet/Offline/3d-unet-Offline-gpu-b8-int8.custom_k_99_9_MaxP.plan" --max_dlas=0 --slice_overlap_patch_kernel_cg_impl=false --scenario Offline --model 3d-unet
5-
[2025-06-01 06:45:54,486 __init__.py:53 INFO] Overriding Environment
1+
[2025-07-01 12:08:39,020 main.py:229 INFO] Detected system ID: KnownSystem.Nvidia_215952ba71cc
2+
[2025-07-01 12:08:39,293 harness.py:249 INFO] The harness will load 3 plugins: ['build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so', 'build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so', 'build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so']
3+
[2025-07-01 12:08:39,294 generate_conf_files.py:107 INFO] Generated measurements/ entries for Nvidia_215952ba71cc_TRT/3d-unet-99.9/Offline
4+
[2025-07-01 12:08:39,294 __init__.py:46 INFO] Running command: ./build/bin/harness_3dunet --plugins="build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so,build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so,build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so" --logfile_outdir="/mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/3d-unet-99.9/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=43 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/kits19/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_08fd7192/inference/mlperf.conf" --tensor_path="build/preprocessed_data/KiTS19/inference/int8" --use_graphs=false --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/3b7fac534a2e4f62847a6722c46f3854.conf" --unet3d_sw_gaussian_patch_path="/home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/preprocessed_data/KiTS19/etc/gaussian_patches.npy" --gpu_engines="./build/engines/Nvidia_215952ba71cc/3d-unet/Offline/3d-unet-Offline-gpu-b8-int8.custom_k_99_9_MaxP.plan" --max_dlas=0 --slice_overlap_patch_kernel_cg_impl=false --scenario Offline --model 3d-unet
5+
[2025-07-01 12:08:39,294 __init__.py:53 INFO] Overriding Environment
66
benchmark : Benchmark.UNET3D
77
buffer_manager_thread_count : 0
88
data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/data
@@ -11,23 +11,23 @@ gpu_copy_streams : 1
1111
gpu_inference_streams : 1
1212
input_dtype : int8
1313
input_format : linear
14-
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_32fceb49/repo/closed/NVIDIA/build/logs/2025.06.01-06.45.53
14+
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_32fceb49/repo/closed/NVIDIA/build/logs/2025.07.01-12.08.38
1515
map_path : data_maps/kits19/val_map.txt
1616
mlperf_conf_path : /home/mlcuser/MLC/repos/local/cache/get-git-repo_08fd7192/inference/mlperf.conf
1717
offline_expected_qps : 0.0
1818
precision : int8
1919
preprocessed_data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/preprocessed_data
2020
scenario : Scenario.Offline
2121
slice_overlap_patch_kernel_cg_impl : False
22-
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='13th Gen Intel(R) Core(TM) i9-13900K', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=1): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=131.63447200000002, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=131634472000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=None, system_id='Nvidia_0aef8e1065ca')
22+
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='13th Gen Intel(R) Core(TM) i9-13900K', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=1): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=131.63447200000002, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=131634472000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=None, system_id='Nvidia_215952ba71cc')
2323
tensor_path : build/preprocessed_data/KiTS19/inference/int8
2424
test_mode : AccuracyOnly
2525
unet3d_sw_gaussian_patch_path : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_f962b684/preprocessed_data/KiTS19/etc/gaussian_patches.npy
2626
use_deque_limit : True
2727
use_graphs : False
28-
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/25ed99cceccc4ff6bda93b605eee4f85.conf
29-
system_id : Nvidia_0aef8e1065ca
30-
config_name : Nvidia_0aef8e1065ca_3d-unet_Offline
28+
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/3b7fac534a2e4f62847a6722c46f3854.conf
29+
system_id : Nvidia_215952ba71cc
30+
config_name : Nvidia_215952ba71cc_3d-unet_Offline
3131
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99_9, PowerSetting.MaxP)
3232
optimization_level : plugin-enabled
3333
num_profiles : 1
@@ -39,23 +39,23 @@ power_limit : None
3939
cpu_freq : None
4040
&&&& RUNNING MLPerf_Inference_3DUNet_Harness # ./build/bin/harness_3dunet
4141
[I] mlperf.conf path: /home/mlcuser/MLC/repos/local/cache/get-git-repo_08fd7192/inference/mlperf.conf
42-
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/25ed99cceccc4ff6bda93b605eee4f85.conf
42+
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/3b7fac534a2e4f62847a6722c46f3854.conf
4343
Creating QSL.
4444
Finished Creating QSL.
4545
Setting up SUT.
4646
[I] [TRT] Loaded engine size: 31 MiB
47-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 80, GPU 1097 (MiB)
48-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 80, GPU 1107 (MiB)
47+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 79, GPU 1097 (MiB)
48+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 79, GPU 1107 (MiB)
4949
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +29, now: CPU 0, GPU 29 (MiB)
50-
[I] Device:0: ./build/engines/Nvidia_0aef8e1065ca/3d-unet/Offline/3d-unet-Offline-gpu-b8-int8.custom_k_99_9_MaxP.plan has been successfully loaded.
50+
[I] Device:0: ./build/engines/Nvidia_215952ba71cc/3d-unet/Offline/3d-unet-Offline-gpu-b8-int8.custom_k_99_9_MaxP.plan has been successfully loaded.
5151
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
52-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 49, GPU 1813 (MiB)
53-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 49, GPU 1821 (MiB)
52+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 48, GPU 1813 (MiB)
53+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 48, GPU 1821 (MiB)
5454
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2218, now: CPU 0, GPU 2247 (MiB)
5555
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: true
5656
Finished setting up SUT.
5757
Starting warmup. Running for a minimum of 5 seconds.
58-
Finished warmup. Ran for 5.46776s.
58+
Finished warmup. Ran for 5.46786s.
5959
Starting running actual test.
6060

6161
No warnings encountered during test.
@@ -72,8 +72,8 @@ Device Device:0 processed:
7272
PerSampleCudaMemcpy Calls: 43
7373
BatchedCudaMemcpy Calls: 0
7474
&&&& PASSED MLPerf_Inference_3DUNet_Harness # ./build/bin/harness_3dunet
75-
[2025-06-01 06:46:11,679 run_harness.py:166 INFO] Result: Accuracy run detected.
76-
[2025-06-01 06:46:11,679 __init__.py:46 INFO] Running command: python3 code/3d-unet/tensorrt/accuracy_kits.py --log_file /mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/3d-unet-99.9/offline/accuracy/mlperf_log_accuracy.json
75+
[2025-07-01 12:08:56,495 run_harness.py:166 INFO] Result: Accuracy run detected.
76+
[2025-07-01 12:08:56,495 __init__.py:46 INFO] Running command: python3 code/3d-unet/tensorrt/accuracy_kits.py --log_file /mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/3d-unet-99.9/offline/accuracy/mlperf_log_accuracy.json
7777
Loading necessary metadata...
7878
Loading loadgen accuracy log...
7979
Running postprocessing...

closed/GATEOverflow/measurements/RTX4090x1-nvidia-gpu-TensorRT-default_config/3d-unet-99.9/offline/cpu_info.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
"MLC_HOST_CPU_L1I_CACHE_SIZE": "384 KiB",
2222
"MLC_HOST_CPU_L2_CACHE_SIZE": "24 MiB",
2323
"MLC_HOST_CPU_TOTAL_LOGICAL_CORES": "32",
24+
"MLC_HOST_CPU_TOTAL_PHYSICAL_CORES": "24",
25+
"MLC_HOST_CPU_PHYSICAL_CORES_LIST": "0-23",
2426
"MLC_HOST_MEMORY_CAPACITY": "128G",
2527
"MLC_HOST_DISK_CAPACITY": "9.1T"
2628
}

0 commit comments

Comments
 (0)