DPC++ daily 2022-08-16
Pre-release
Pre-release
·
129035 commits
to sycl
since this release
[SYCL] Improve range reduction performance on CPU (#6164) The performance improvement is the result of two complementary changes: Using an alternative heuristic to select work-group size on the CPU. Keeping work-groups small simplifies combination of partial results and reduces the number of temporary variables. Adjusting the mapping of the range to an ND-range. Breaking the range into contiguous chunks that are assigned to each results in streaming patterns that are better-suited to prefetching hardware. Signed-off-by: John Pennycook john.pennycook@intel.com