Skip to content

Conversation

@yuanxion
Copy link
Contributor

@yuanxion yuanxion commented Oct 17, 2025

Details

fixed matrix_nms_ref stage 0 kernel GPU memory allocation issue

Description of the issue

Symptom

pp_yolo model will be failed to run inference, and CL_OUT_OF_RESOUCE will be prompted when creating stage 0 matrix_nms_kernel.

Root cause

  • It will try to allocate 1 * 80 * 22743 * 22742/2 * 4 = 82.7GB GPU memory (for batch 1 * classes 80 = 80 GPU work items) in the matrix_nms_ref stage 0 kernel, which exceeds the GPU's total memory size and will be failed to create matrix_nms_kernel.
  • Also, the matrix_nms_kernel has large size (22743) "for loop" which takes much GPU resources.

How to fix it

  • Use global memory buffer allocated by host instead of private memory.
  • Use chunking for the "for loop".

The code and line that caused this issue

Reproduction step and snapshot

  • benchmark_app
    benchmark_app -inference_only false -b 1 -d GPU.0 -hint none -infer_precision f32 -m FP32/1/ov/pp-yolo.xml

Problematic graph

  • N/A

Checklist

  • Is it a proper fix? (not a workaround)
  • Did you include test case for this fix, if necessary? Yes, testcase matrix_nms_test_inputs.get_matrix_nms_large_value_of_max_boxes_per_class is added.
  • Did you review existing test that can be extended to cover this scenario? Which test did you review? No existing test can cover the issue.

Tickets:

@yuanxion yuanxion requested review from a team as code owners October 17, 2025 08:44
@yuanxion yuanxion added this to the 2025.4 milestone Oct 17, 2025
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Oct 17, 2025
@wilson-seok
Copy link
Contributor

@yuanxion Please add func or unit test for this case.

@yuanxion
Copy link
Contributor Author

@yuanxion Please add func or unit test for this case.

Thanks, added.

@yuanxion yuanxion force-pushed the fix-matrix-nms branch 3 times, most recently from c4156cf to bd1c805 Compare October 24, 2025 06:00
@yuanxion yuanxion marked this pull request as draft October 24, 2025 08:09
@yuanxion yuanxion marked this pull request as ready for review October 24, 2025 11:39
Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
Signed-off-by: yuan.xiong <yuan.xiong@intel.com>
@wilson-seok
Copy link
Contributor

LGTM

@wilson-seok wilson-seok self-requested a review October 28, 2025 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants