@@ -8,6 +8,90 @@ The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C
8
8
and provides high-productivity APIs aimed to minimize programming efforts of C++ developers
9
9
creating efficient heterogeneous applications.
10
10
11
+ New in 2022.6.0
12
+ ===============
13
+ News
14
+ ------------
15
+ - `oneAPI DPC++ Library Manual Migration Guide `_ to simplify the migration of Thrust* and CUB* APIs from CUDA*.
16
+ - ``radix_sort `` and ``radix_sort_by_key `` kernel templates were moved into
17
+ ``oneapi::dpl::experimental::kt::gpu::esimd `` namespace. The former ``oneapi::dpl::experimental::kt::esimd ``
18
+ namespace is deprecated and will be removed in a future release.
19
+ - The ``for_loop ``, ``for_loop_strided ``, ``for_loop_n ``, ``for_loop_n_strided `` algorithms
20
+ in `namespace oneapi::dpl::experimental ` are enforced to fail with device execution policies.
21
+
22
+ New Features
23
+ ------------
24
+ - Added experimental ``inclusive_scan `` kernel template algorithm residing in
25
+ the ``oneapi::dpl::experimental::kt::gpu `` namespace.
26
+ - ``radix_sort `` and ``radix_sort_by_key `` kernel templates are extended with overloads for out-of-place sorting.
27
+ These overloads preserve the input sequence and sort data into the user provided output sequence.
28
+ - Improved performance of the ``reduce ``, ``min_element ``, ``max_element ``, ``minmax_element ``, ``is_partitioned ``,
29
+ ``lexicographical_compare ``, ``binary_search ``, ``lower_bound ``, and ``upper_bound `` algorithms with device policies.
30
+ - ``sort ``, ``stable_sort ``, ``sort_by_key `` algorithms now use Radix sort [#fnote1 ]_
31
+ for sorting ``sycl::half `` elements compared with ``std::less `` or ``std::greater ``.
32
+
33
+ Fixed Issues
34
+ ------------
35
+ - Fixed compilation errors when using ``reduce ``, ``min_element ``, ``max_element ``, ``minmax_element ``,
36
+ ``is_partitioned ``, and ``lexicographical_compare `` with Intel oneAPI DPC++/C++ compiler 2023.0 and earlier.
37
+ - Fixed possible data races in the following algorithms used with device execution policies:
38
+ ``remove_if ``, ``unique ``, ``inplace_merge ``, ``stable_partition ``, ``partial_sort_copy ``, ``rotate ``.
39
+ - Fixed excessive copying of data in ``std::vector `` allocated with a USM allocator for standard library
40
+ implementations which have allocator information in the ``std::vector::iterator `` type.
41
+ - Fixed an issue where checking ``std::is_default_constructible `` for ``transform_iterator `` with a functor
42
+ that is not default-constructible could cause a build error or an incorrect result.
43
+ - Fixed handling of `sycl device copyable `_ for internal and public oneDPL types.
44
+ - Fixed handling of ``std::reverse_iterator `` as input to oneDPL algorithms using a device policy.
45
+ - Fixed ``set_intersection `` to always copy from the first input sequence to the output,
46
+ where previously some calls would copy from the second input sequence.
47
+ - Fixed compilation errors when using ``oneapi::dpl::zip_iterator `` with the oneTBB backend and C++20.
48
+
49
+ Known Issues and Limitations
50
+ ----------------------------
51
+ New in This Release
52
+ ^^^^^^^^^^^^^^^^^^^
53
+ - ``histogram `` algorithm requires the output value type to be an integral type no larger than 4 bytes
54
+ when used with an FPGA policy.
55
+
56
+ Existing Issues
57
+ ^^^^^^^^^^^^^^^
58
+ See oneDPL Guide for other `restrictions and known limitations `_.
59
+
60
+ - When compiled with ``-fsycl-pstl-offload `` option of Intel oneAPI DPC++/C++ compiler and with
61
+ ``libstdc++ `` version 8 or ``libc++ ``, ``oneapi::dpl::execution::par_unseq `` offloads
62
+ standard parallel algorithms to the SYCL device similarly to ``std::execution::par_unseq ``
63
+ in accordance with the ``-fsycl-pstl-offload `` option value.
64
+ - When using the dpl modulefile to initialize the user's environment and compiling with ``-fsycl-pstl-offload ``
65
+ option of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory
66
+ containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working
67
+ environment to avoid the issue.
68
+ - Compilation issues may be encountered when passing zip iterators to ``exclusive_scan_by_segment `` on Windows.
69
+ - For ``transform_exclusive_scan `` and ``exclusive_scan `` to run in-place (that is, with the same data
70
+ used for both input and destination) and with an execution policy of ``unseq `` or ``par_unseq ``,
71
+ it is required that the provided input and destination iterators are equality comparable.
72
+ Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
73
+ If these conditions are not met, the result of these algorithm calls is undefined.
74
+ - ``sort ``, ``stable_sort ``, ``sort_by_key ``, ``partial_sort_copy `` algorithms may work incorrectly or cause
75
+ a segmentation fault when used a DPC++ execution policy for CPU device, and built
76
+ on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
77
+ To avoid the issue, pass ``-fsycl-device-code-split=per_kernel `` option to the compiler.
78
+ - Incorrect results may be produced by ``exclusive_scan ``, ``inclusive_scan ``, ``transform_exclusive_scan ``,
79
+ ``transform_inclusive_scan ``, ``exclusive_scan_by_segment ``, ``inclusive_scan_by_segment ``, ``reduce_by_segment ``
80
+ with ``unseq `` or ``par_unseq `` policy when compiled by Intel® oneAPI DPC++/C++ Compiler
81
+ with ``-fiopenmp ``, ``-fiopenmp-simd ``, ``-qopenmp ``, ``-qopenmp-simd `` options on Linux.
82
+ To avoid the issue, pass ``-fopenmp `` or ``-fopenmp-simd `` option instead.
83
+ - Incorrect results may be produced by ``reduce ``, ``reduce_by_segment ``, and ``transform_reduce ``
84
+ with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer
85
+ and executed on GPU devices.
86
+ For a workaround, define the ``ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION `` macro to ``1 `` before
87
+ including oneDPL header files.
88
+ - ``std::tuple ``, ``std::pair `` cannot be used with SYCL buffers to transfer data between host and device.
89
+ - ``std::array `` cannot be swapped in DPC++ kernels with ``std::swap `` function or ``swap `` member function
90
+ in the Microsoft* Visual C++ standard library.
91
+ - The ``oneapi::dpl::experimental::ranges::reverse `` algorithm is not available with ``-fno-sycl-unnamed-lambda `` option.
92
+ - STL algorithm functions (such as ``std::for_each ``) used in DPC++ kernels do not compile with the debug version of
93
+ the Microsoft* Visual C++ standard library.
94
+
11
95
New in 2022.5.0
12
96
===============
13
97
@@ -661,8 +745,8 @@ Known Issues and Limitations
661
745
(including ``std::ldexp ``, ``std::frexp ``, ``std::sqrt(std::complex<float>) ``) require device support
662
746
for double precision.
663
747
664
- .. [#fnote1 ] The sorting algorithms in oneDPL use Radix sort for arithmetic data types compared with
665
- ``std::less `` or ``std::greater ``, otherwise Merge sort.
748
+ .. [#fnote1 ] The sorting algorithms in oneDPL use Radix sort for arithmetic data types and
749
+ ``sycl::half `` (since oneDPL 2022.6) compared with `` std::less `` or ``std::greater ``, otherwise Merge sort.
666
750
.. _`the oneDPL Specification` : https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html
667
751
.. _`oneDPL Guide` : https://oneapi-src.github.io/oneDPL/index.html
668
752
.. _`Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes` : https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html
@@ -671,3 +755,4 @@ Known Issues and Limitations
671
755
.. _`Macros` : https://oneapi-src.github.io/oneDPL/macros.html
672
756
.. _`2022.0 Changes` : https://oneapi-src.github.io/oneDPL/oneDPL_2022.0_changes.html
673
757
.. _`sycl device copyable` : https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec::device.copyable
758
+ .. _`oneAPI DPC++ Library Manual Migration Guide` : https://www.intel.com/content/www/us/en/developer/articles/guide/oneapi-dpcpp-library-manual-migration.html
0 commit comments