Skip to content

Commit e526f3c

Browse files
AlexeySachkovbadersteffenlarsen
authored
[SYCL][Doc] Update release notes (#7033)
Co-authored-by: Alexey Bader <alexey.bader@intel.com> Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>
1 parent 227614c commit e526f3c

File tree

1 file changed

+290
-0
lines changed

1 file changed

+290
-0
lines changed

sycl/ReleaseNotes.md

Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,293 @@
1+
# September'22 release notes
2+
3+
Release notes for commit range [`4043dda3..0f579bae`](https://github.com/intel/llvm/compare/4043dda3...0f579bae)
4+
5+
## New features
6+
7+
### SYCL Compiler
8+
9+
- Added ability to enforce stateless memory accesses for ESIMD. [18111623]
10+
- Added support for `-fsycl-force-target` compiler option. [1d95f2ec]
11+
- Added support for `[[intel::max_reinvocation_delay`]] loop attribute. [90fa5bb0]
12+
- Added support for `-fsycl-huge-device-code` compiler option, which allows
13+
linking object files larger than 2GB. [f963062c]
14+
- Added support for compiling `.cu` files with SYCL compiler. [e76ad72f]
15+
- Added support for `assert` on HIP backend. [ade1870a]
16+
- Enabled CXX standard library functions for CUDA backend. [1fe92c54]
17+
- Implemented group collective built-in functions for more integral types. [d4933b6f]
18+
19+
### SYCL Library
20+
21+
- Implemented SYCL 2020 callable device selectors. [64f0db7a]
22+
- Implemented SYCL 2020 standalone device selectors. [bfc7e984]
23+
- Added SYCL 2020 property interfaces for `local_accessor`, `usm_allocator`,
24+
`accessor` and `host_accessor` classes. [1136b403] [da7dcf82]
25+
- Added support for `fpga_simulator_selector`. [9bef890d]
26+
- Added support for `local_accessor`. Deprecated `target::local`. [e4423ef4]
27+
- Added support for querying free device memory on Level Zero backend. [0eeef2b3]
28+
- Added support for querying free device memory on CUDA and HIP backends. [436f0d89]
29+
- Implemented `bfloat16` conversions from/to `float` for host. [2a383f1c]
30+
- Added support for `ext::oneapi::property::queue::discard_events` to
31+
Level Zero PI plugin. [13721204]
32+
- Added `lsc_atomic` support on ESIMD emulator. [0c051a89]
33+
- Added `dpas` support on ESIMD emulator. [3d506a34]
34+
- Added C++ API for `imf` libdevice built-ins. [830916a3]
35+
- Implemented `make_queue` for CUDA backend. [89460e81]
36+
- Implemented `has_native_event` and `make_event` for CUDA backend. [74369c84]
37+
- Added support of CUDA XPTI tracing. [0cd04144]
38+
- Introduced predicates for ESIMD `lsc_block_store/load`. [f44edce3]
39+
- Added experimental `set_kernel_properties` API and `use_double_grf` property
40+
for ESIMD. [9a55da53]
41+
- Added "eager initialization" mode to Level Zero PI plugin. It might result
42+
in an unnecessary work done by the plugin, but ensures the fastest possible
43+
execution on hot and reportable paths. [c1459598]
44+
- Added full support of element wise operations on `joint_matrix` on CUDA
45+
backend including `bfloat16` support. [0a1d751b]
46+
- Implemented `group::get_linear_id(int)` method [6e83c127]
47+
48+
### Documentation
49+
50+
- Added stateful to stateless memory access conversion
51+
[design document](sycl/doc/design/ESIMDStatelesAccessors.md). [3e03f300]
52+
- Added [`sycl_ext_oneapi_complex`](sycl/doc/extensions/proposed/sycl_ext_oneapi_complex.asciidoc)
53+
extension proposal. [01589da5]
54+
- Updated [`sycl_ext_intel_fpga_device_selector`](sycl/doc/extensions/supported/sycl_ext_intel_fpga_device_selector.asciidoc)
55+
extension to add `fpga_simulator_accessor`. [9bef890d]
56+
- Added [`sycl_ext_intel_fpga_kernel_interface_properties`](sycl/doc/extension/proposed/sycl_ext_intel_fpga_kernel_interface_properties.asciidoc) extension proposal. [4b6bd14b]
57+
- Updated [`sycl_ext_oneapi_complex_algorithms`](sycl/doc/extensions/proposed/sycl_ext_oneapi_complex_algorithms.asciidoc)
58+
extension to include `sycl::complex` as supported type for algorithms. [07c5b48f]
59+
- Clarified sub-group size calculation in [`sycl_ext_oneapi_invoke_simd`](sycl/doc/extensions/experimental/sycl_ext_oneapi_invoke_simd.asciidoc) extension spec. [9b33ad0f]
60+
- Updated [`sycl_ext_oneapi_accessor_properties`](sycl/doc/extensions/supported/sycl_ext_oneapi_accessor_properties.asciidoc)
61+
to mark `has_property` API as `noexcept`. [7805aa3f]
62+
- Updated [`sycl_ext_intel_device_info`](sycl/doc/extensions/supported/sycl_ext_intel_device_info.md)
63+
to support querying free device memory. [0eeef2b3]
64+
- Updated [`sycl_ext_oneapi_matrix`](sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc)
65+
with description of new matrix features. [770f540d]
66+
- Moved [`sycl_ext_oneapi_invoke_simd`](sycl/doc/extensions/experimental/sycl_ext_oneapi_invoke_simd.asciidoc)
67+
extensions specification from `proposed` to `experimental` because
68+
implementation is available. [6bee3440]
69+
70+
## Improvements
71+
72+
### SYCL Library
73+
74+
- Ensured that a correct `errc` thrown for an unassociated placeholder
75+
accessor. [4f9935ad]
76+
- Removed dependency on OpenCL ICD Loader from the runtime. [90e8b5ef]
77+
- Added support for `ZEBIN` format to persistent caching mechanism. [34dcf83d]
78+
- Added identification mechanism for binaries in newer `ZEBIN` format. [f4dee549]
79+
- Switched to use `struct` information descriptors in accordance with SYCL 2020.
80+
Removed some deprecated information queries. [b3cbda57]
81+
- Updated `kernel_device_specific::max_sub_group_size` query to match SYCL 2020
82+
spec. Deprecated the old variant. [7842d056]
83+
- Deprecated SYCL 1.2.1 device selectors. [c058380f]
84+
- Improved error messages reported for unsupported device partitioning. [1c9ddbaa]
85+
- Made `device` and `platform` default to `default_selector_v`. [b32dd41a]
86+
- Deprecated `address_space::constant_space`. [351b123a]
87+
- Marked `sycl::exception::has_context` as `noexcept`. [ad923c9a]
88+
- Improved range reductions performance on CPU. [3323da65]
89+
- Made `sycl::exception` `nothrow` copy constructible. [289e33da]
90+
- Marked `has_property` methods as `noexcept`. [417b5a29]
91+
- Improved `sycl::event::get_profiling_info` exception message when `event` is
92+
default constructed. [2e86cd459]
93+
- Added a diagnostic (in form of `static_assert`) about kernel lambda size
94+
mismatch between host and device. [d278c671] [ec179b7a] [f417a88e]
95+
- Updated `pipes` class to throw exceptions if used on host. [eab29696]
96+
- Updated ESIMD Emulator PI plugin to report support for `cl_khr_fp64`
97+
extension. [398571a5]
98+
- Updated Level Zero plugin to prefer copy engine for memory read/write
99+
operations. [65c3ea29]
100+
- Optimized some memory transfers. [92d35cd1]
101+
- Enabled event caching in Level Zero PI plugin. [a41b33c3]
102+
- Optimized some reductions with `parallel_for` accepting `sycl::range`
103+
for discrete GPUs. [c22a5d3f]
104+
- Improved performance of event synchronization on CUDA backend. [c4f326aa]
105+
- Added ability to use descendent devices of context members within that
106+
context. Not supported with OpenCL backend yet. [a0c8c503] [78a483c7]
107+
- Added support for querying `atomic64` device capability with HIP backend. [cb190fc2]
108+
- Enabled FTZ operations for CUDA/PTX backend via
109+
`-fcuda-flush-denormals-to-zero`. [e8e7ae83]
110+
- Improved error message about incorrect kernel argument types with CUDA backend. [2542e6a8]
111+
- Limited allowed argument types for `rol/ror` ESIMD functions to better
112+
represent HW capabilities. [b05f256c]
113+
- Implemented `mem_advise` reset and managed memory checks for CUDA backend. [fe18839c]
114+
- Added concurrent memory check to `mem_advise` on CUDA backend. [33746d8d]
115+
- Enabled multiple HIP streams per SYCL queue. [e0c40a9f]
116+
- Implemented lazy mechanism of setting context for default-constructed events. [ed92c4ca]
117+
- Improved performance for multi-dimensional accessors with multiple accesses
118+
in a kernel. [7c58b9a6]
119+
120+
### SYCL Compiler
121+
122+
- Increased max `_Bitint` size to 4096 for FPGA target. [db5f72a8] [3f06cad0]
123+
- Removed deprecation message for `[[intel::disable_loop_pipelining]]` attribute. [07201f56]
124+
- Allowed `__builtin_assume_aligned` to be called from device code. [24937eac]
125+
- Improved link step performance when `per_kernel` device code split is used. [84de9d6d]
126+
- Added support for `SYCL_EXTERNAL` on `device_global` variables. [8b958f67]
127+
- Updated `__builtin_intel_fpga_mem` to accept more parameters. [231338dc]
128+
- Updated `ivdep` attribute to allow `safelen = 0`. [558b3ba4]
129+
- Improved linking with `sycl.lib` on Windows. [404d2816]
130+
- Implemented more diagnostics about incorrect `device_global` usages. [12657218]
131+
- Improved library resolution for `libsycl.so`. [4ce19d69]
132+
- Improved diagnostics when linking with mismatched objects. [0e0202ee]
133+
- Added a warning for floating-point size changes after implicit conversions. [e4f5d55f]
134+
- Made `invoke_simd` convert its argument to appropriate types. [038764fd]
135+
136+
### Documentation
137+
138+
- Removed explicit `cl` namespace references. [433ea5c7]
139+
- Added a short guideline on using CMake with SYCL compiler. [fa603c3e]
140+
141+
## Bug fixes
142+
143+
### SYCL Library
144+
145+
- Fixed a compilation issue where it wasn't possible to pass an initializer list
146+
for dependency events vector in `queue` shortcuts with `offset`
147+
parameter. [f4f83d95]
148+
- Fixed `sycl::get_pointer_device` throwing an exception when it passed a
149+
descendent device (sub-device) instead of a root device. [26d5d98b]
150+
- Fixed memory leak happening when kernel bundles are linked. [980677d9]
151+
- Fixed USM free throwing an exception when it passed a context created for
152+
a descendent device. [c49d4944]
153+
- Fixed `accessor`'s CTAD for `g++` host compiler. [57aabe7e]
154+
- Fixed a compilation issue when using multi-dimensional `accessor`'s subscript
155+
operator. [22e3fc56]
156+
- Fixed "definition with the same mangled name" error happening when used
157+
multiple buffer reductions in a kernel. [a0a4d721]
158+
- Fixed a compilation issue with SYCL math built-ins when GCC < 11.1 is used as
159+
a host compiler. [c786894f]
160+
- Fixed a compilation issue with SYCL math built-ins (such as `sycl::modf`,
161+
for example) not accepting pointers to `half`. [e2861665]
162+
- Fixed an issues with `reduction`s when MSVC is used as host compiler. [94c4b80a]
163+
- Fixed a compilation issue when fully specialized `sycl::span` is initialized
164+
from an array. [2b50820b]
165+
- Fixed a crash in Level Zero PI plugins caused by specialization constants not
166+
being used on device side, but present in a program. [9500875f]
167+
- Fixed event leak in Level Zero plugin. [6d04aa64]
168+
- Fixed an issue with sub-sub-devices in Level Zero plugin. [4b1b01bd]
169+
- Fixed an issue with incorrect `half` conversion on ESIMD emulator. [6143e55a]
170+
- Fixed a compilation issue with `abs` ESIMD function. [c72a85dd]
171+
- Fixed some warning coming out of SYCL headers when compiled in C++20 mode. [12ac4c36]
172+
- Fixed a compilation issue when using multiple bitwise shift operations
173+
in ESIMD. [40d08c23]
174+
- Fixed a crash in Level Zero PI plugin which occurs when the runtime tries to reset
175+
a command list which does not have a synchronization fence associated with it. [a61ac7a0]
176+
- Fixed a performance issue with excessive streams synchronization
177+
on CUDA backend. [5352b423]
178+
- Fixed a compilation issue with
179+
`sycl::get_native<sycl::backend::ext_oneapi_cuda>(sycl::device)` free
180+
function (intel/llvm#6653). [4d69c297]
181+
- Fixed synchronization issue for explicit dependencies (`depends_on` usage)
182+
which is blocked by host task or host accessor. [346a6c53]
183+
- Fixed an issue in Level Zero plugin which could cause barriers to not be
184+
correctly applied for an entire queue. [d01371b3]
185+
- Fixed a synchronization issue on CUDA backend. [e848c15f]
186+
- Fixed an issue with a context not properly retained with even interop on
187+
CUDA backend. [2baf1de5]
188+
- Fixed `accessor` so gdb can parse its template parameters correctly. [372cc948]
189+
- Fixed uses of common macro names in the implementation's header files. [e87adfd2]
190+
- Fixed a performance regression in Level Zero backend related to command list. [8a4777d0]
191+
192+
### SYCL Compiler
193+
194+
- Fixed cleanup of temporary files produced by unbundling archives. [7ef4b0a3]
195+
- Fixed optimizing out `device_global` variables with internal linkage. [5b2cfe21]
196+
- Fixed an issue when compiling and linking with different optimization levels
197+
could cause runtime errors. [0cc7540e]
198+
- Fixed description of `-f[no-]sycl-unnamed-lambda` compiler option. [616ecf75]
199+
- Fixed an issue when building SYCL programs in Debug mode with
200+
`Windows-Clang.cmake`. [084f34c1]
201+
- Fixed CUDA fat-binaries filename extensions in SYCL toolchain
202+
(`.o` -> `.cubin`). [2c217cfd]
203+
- Fixed an issue causing incorrect conversions involving unsigned types in
204+
ESIMD. [2a1151b8]
205+
- Fixed a crash in applications containing a mix of unnamed ESIMD and non-ESIMD
206+
kernels. [3d54d7ef]
207+
- Fixed type casting of image coordinates in built-ins implementation for CUDA
208+
backend. [338b4edc]
209+
- Fixed a linking issue when targeting AMD/HIP. [9829897a]
210+
- Fixed an issue when `op[]` was called with typedef argument under gdb. [b7bfe391]
211+
212+
## API/ABI breakages
213+
214+
- Removed deprecated `kernel::get_work_group_info` [94883470]
215+
- Removed deprecated `get_native` class method [282e7744]
216+
- Removed support for `intel::fpga_pipeline` attribute [6fe90d20]
217+
- Added `MAJOR_VERSION` to the name of the SYCL library on Windows. [c8427d6d]
218+
- Removed `sycl::program` class. [85a6833d]
219+
- Removed `ext::oneapi::reduction`. [7fa607fb]
220+
- Removed deprecated `address_space` enum values. [351b123a]
221+
- Removed `event::get` method. [da296ba3]
222+
- Removed `using namespace experimental` inside `ext::intel`. [7ddd8a97]
223+
- Made intel-specific device info descriptors namespace qualified. [0f4a0f3b]
224+
- Removed deprecated `make_queue` API. [26b7762a]
225+
- Aligned return types of `sycl::get_native` and `interop::get_native_mem`
226+
functions to be conformance with SYCL 2020 spec. [78275902]
227+
- Aligned `sycl::buffer_allocator` interface with SYCL 2020 spec. [78275902]
228+
- Removed `cl` namespace from `sycl/sycl.hpp` header. [3d2b25e1]
229+
- Dropped support for compiling SYCL in less than C++17 mode. [43e713c3]
230+
- Many other ABI-breaking changes resulting from internal refactoring
231+
232+
# Known issues
233+
234+
- This release is not backwards compatible with previous releases, which means
235+
that existing SYCL applications won't work with the never runtime without
236+
re-compilation.
237+
- Having MESA OpenCL implementation which provides no devices on a
238+
system may cause incorrect device discovery. As a workaround such an OpenCL
239+
implementation can be disabled by removing `/etc/OpenCL/vendor/mesa.icd`.
240+
- Compilation may fail on Windows in debug mode if a kernel uses
241+
`std::array`. This happens because debug version of `std::array` in
242+
Microsoft STL C++ headers calls functions that are illegal for the device
243+
code. As a workaround the following can be done:
244+
1. Dump compiler pipeline execution strings by passing `-###` option to the
245+
compiler. The compiler will print the internal execution strings of
246+
compilation tools. The actual compilation will not happen.
247+
2. Modify the (usually) first execution string (it should have
248+
`-fsycl-is-device` option) by adding
249+
`-D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0` options to the
250+
end of the string. Execute all string one by one.
251+
- `-fsycl-dead-args-optimization` can't help eliminate offset of
252+
accessor even though it's created with no offset specified
253+
- SYCL 2020 barriers show worse performance than SYCL 1.2.1 do. [18c80faa]
254+
- When using fallback assert in separate compilation flow it requires explicit
255+
linking against `lib/libsycl-fallback-cassert.o` or
256+
`lib/libsycl-fallback-cassert.spv`
257+
- Limit alignment of allocation requests at 64KB which is the only alignment
258+
supported by Level Zero. 7dfaf3bd
259+
- On the following scenario on Level Zero backend:
260+
1. Kernel A, which uses buffer A, is submitted to queue A.
261+
2. Kernel B, which uses buffer B, is submitted to queue B.
262+
3. `queueA.wait()`.
263+
4. `queueB.wait()`.
264+
DPCPP runtime used to treat unmap/write commands for buffer A/B as host
265+
dependencies (i.e. they were waited for prior to enqueueing any command
266+
that's dependent on them). This allowed Level Zero plugin to detect that
267+
each queue is idle on steps 1/2 and submit the command list right away.
268+
This is no longer the case since we started passing these dependencies in an
269+
event waitlist and Level Zero plugin attempts to batch these commands, so
270+
the execution of kernel B starts only on step 4. The workaround restores the
271+
old behavior in this case until this is resolved. [2023e10d] [6c137f87]
272+
- User-defined functions with the name and signature matching those of any
273+
OpenCL C built-in function (i.e. an exact match of arguments, return type
274+
doesn't matter) can lead to Undefined Behavior.
275+
- A DPC++ system that has FPGAs installed does not support multi-process
276+
execution. Creating a context opens the device associated with the context
277+
and places a lock on it for that process. No other process may use that
278+
device. Some queries about the device through `device.get_info<>()` also
279+
open up the device and lock it to that process since the runtime needs
280+
to query the actual device to obtain that information.
281+
- The format of the object files produced by the compiler can change between
282+
versions. The workaround is to rebuild the application.
283+
- Using `sycl::kernel_bundle` API to refer to a kernel defined
284+
in another translation unit leads to undefined behavior
285+
- Linkage errors with the following message:
286+
`error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined`
287+
can happen when a SYCL application is built using MS Visual Studio 2019
288+
version below 16.3.0 and user specifies `-std=c++14` or `/std:c++14`.
289+
- Printing internal defines isn't supported on Windows. [50628db1]
290+
1291
# June'22 release notes
2292

3293
Release notes for commit range f34ba2c..4043dda

0 commit comments

Comments
 (0)