|
| 1 | +# September'22 release notes |
| 2 | + |
| 3 | +Release notes for commit range [`4043dda3..0f579bae`](https://github.com/intel/llvm/compare/4043dda3...0f579bae) |
| 4 | + |
| 5 | +## New features |
| 6 | + |
| 7 | +### SYCL Compiler |
| 8 | + |
| 9 | +- Added ability to enforce stateless memory accesses for ESIMD. [18111623] |
| 10 | +- Added support for `-fsycl-force-target` compiler option. [1d95f2ec] |
| 11 | +- Added support for `[[intel::max_reinvocation_delay`]] loop attribute. [90fa5bb0] |
| 12 | +- Added support for `-fsycl-huge-device-code` compiler option, which allows |
| 13 | + linking object files larger than 2GB. [f963062c] |
| 14 | +- Added support for compiling `.cu` files with SYCL compiler. [e76ad72f] |
| 15 | +- Added support for `assert` on HIP backend. [ade1870a] |
| 16 | +- Enabled CXX standard library functions for CUDA backend. [1fe92c54] |
| 17 | +- Implemented group collective built-in functions for more integral types. [d4933b6f] |
| 18 | + |
| 19 | +### SYCL Library |
| 20 | + |
| 21 | +- Implemented SYCL 2020 callable device selectors. [64f0db7a] |
| 22 | +- Implemented SYCL 2020 standalone device selectors. [bfc7e984] |
| 23 | +- Added SYCL 2020 property interfaces for `local_accessor`, `usm_allocator`, |
| 24 | + `accessor` and `host_accessor` classes. [1136b403] [da7dcf82] |
| 25 | +- Added support for `fpga_simulator_selector`. [9bef890d] |
| 26 | +- Added support for `local_accessor`. Deprecated `target::local`. [e4423ef4] |
| 27 | +- Added support for querying free device memory on Level Zero backend. [0eeef2b3] |
| 28 | +- Added support for querying free device memory on CUDA and HIP backends. [436f0d89] |
| 29 | +- Implemented `bfloat16` conversions from/to `float` for host. [2a383f1c] |
| 30 | +- Added support for `ext::oneapi::property::queue::discard_events` to |
| 31 | + Level Zero PI plugin. [13721204] |
| 32 | +- Added `lsc_atomic` support on ESIMD emulator. [0c051a89] |
| 33 | +- Added `dpas` support on ESIMD emulator. [3d506a34] |
| 34 | +- Added C++ API for `imf` libdevice built-ins. [830916a3] |
| 35 | +- Implemented `make_queue` for CUDA backend. [89460e81] |
| 36 | +- Implemented `has_native_event` and `make_event` for CUDA backend. [74369c84] |
| 37 | +- Added support of CUDA XPTI tracing. [0cd04144] |
| 38 | +- Introduced predicates for ESIMD `lsc_block_store/load`. [f44edce3] |
| 39 | +- Added experimental `set_kernel_properties` API and `use_double_grf` property |
| 40 | + for ESIMD. [9a55da53] |
| 41 | +- Added "eager initialization" mode to Level Zero PI plugin. It might result |
| 42 | + in an unnecessary work done by the plugin, but ensures the fastest possible |
| 43 | + execution on hot and reportable paths. [c1459598] |
| 44 | +- Added full support of element wise operations on `joint_matrix` on CUDA |
| 45 | + backend including `bfloat16` support. [0a1d751b] |
| 46 | +- Implemented `group::get_linear_id(int)` method [6e83c127] |
| 47 | + |
| 48 | +### Documentation |
| 49 | + |
| 50 | +- Added stateful to stateless memory access conversion |
| 51 | + [design document](sycl/doc/design/ESIMDStatelesAccessors.md). [3e03f300] |
| 52 | +- Added [`sycl_ext_oneapi_complex`](sycl/doc/extensions/proposed/sycl_ext_oneapi_complex.asciidoc) |
| 53 | + extension proposal. [01589da5] |
| 54 | +- Updated [`sycl_ext_intel_fpga_device_selector`](sycl/doc/extensions/supported/sycl_ext_intel_fpga_device_selector.asciidoc) |
| 55 | + extension to add `fpga_simulator_accessor`. [9bef890d] |
| 56 | +- Added [`sycl_ext_intel_fpga_kernel_interface_properties`](sycl/doc/extension/proposed/sycl_ext_intel_fpga_kernel_interface_properties.asciidoc) extension proposal. [4b6bd14b] |
| 57 | +- Updated [`sycl_ext_oneapi_complex_algorithms`](sycl/doc/extensions/proposed/sycl_ext_oneapi_complex_algorithms.asciidoc) |
| 58 | + extension to include `sycl::complex` as supported type for algorithms. [07c5b48f] |
| 59 | +- Clarified sub-group size calculation in [`sycl_ext_oneapi_invoke_simd`](sycl/doc/extensions/experimental/sycl_ext_oneapi_invoke_simd.asciidoc) extension spec. [9b33ad0f] |
| 60 | +- Updated [`sycl_ext_oneapi_accessor_properties`](sycl/doc/extensions/supported/sycl_ext_oneapi_accessor_properties.asciidoc) |
| 61 | + to mark `has_property` API as `noexcept`. [7805aa3f] |
| 62 | +- Updated [`sycl_ext_intel_device_info`](sycl/doc/extensions/supported/sycl_ext_intel_device_info.md) |
| 63 | + to support querying free device memory. [0eeef2b3] |
| 64 | +- Updated [`sycl_ext_oneapi_matrix`](sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix.asciidoc) |
| 65 | + with description of new matrix features. [770f540d] |
| 66 | +- Moved [`sycl_ext_oneapi_invoke_simd`](sycl/doc/extensions/experimental/sycl_ext_oneapi_invoke_simd.asciidoc) |
| 67 | + extensions specification from `proposed` to `experimental` because |
| 68 | + implementation is available. [6bee3440] |
| 69 | + |
| 70 | +## Improvements |
| 71 | + |
| 72 | +### SYCL Library |
| 73 | + |
| 74 | +- Ensured that a correct `errc` thrown for an unassociated placeholder |
| 75 | + accessor. [4f9935ad] |
| 76 | +- Removed dependency on OpenCL ICD Loader from the runtime. [90e8b5ef] |
| 77 | +- Added support for `ZEBIN` format to persistent caching mechanism. [34dcf83d] |
| 78 | +- Added identification mechanism for binaries in newer `ZEBIN` format. [f4dee549] |
| 79 | +- Switched to use `struct` information descriptors in accordance with SYCL 2020. |
| 80 | + Removed some deprecated information queries. [b3cbda57] |
| 81 | +- Updated `kernel_device_specific::max_sub_group_size` query to match SYCL 2020 |
| 82 | + spec. Deprecated the old variant. [7842d056] |
| 83 | +- Deprecated SYCL 1.2.1 device selectors. [c058380f] |
| 84 | +- Improved error messages reported for unsupported device partitioning. [1c9ddbaa] |
| 85 | +- Made `device` and `platform` default to `default_selector_v`. [b32dd41a] |
| 86 | +- Deprecated `address_space::constant_space`. [351b123a] |
| 87 | +- Marked `sycl::exception::has_context` as `noexcept`. [ad923c9a] |
| 88 | +- Improved range reductions performance on CPU. [3323da65] |
| 89 | +- Made `sycl::exception` `nothrow` copy constructible. [289e33da] |
| 90 | +- Marked `has_property` methods as `noexcept`. [417b5a29] |
| 91 | +- Improved `sycl::event::get_profiling_info` exception message when `event` is |
| 92 | + default constructed. [2e86cd459] |
| 93 | +- Added a diagnostic (in form of `static_assert`) about kernel lambda size |
| 94 | + mismatch between host and device. [d278c671] [ec179b7a] [f417a88e] |
| 95 | +- Updated `pipes` class to throw exceptions if used on host. [eab29696] |
| 96 | +- Updated ESIMD Emulator PI plugin to report support for `cl_khr_fp64` |
| 97 | + extension. [398571a5] |
| 98 | +- Updated Level Zero plugin to prefer copy engine for memory read/write |
| 99 | + operations. [65c3ea29] |
| 100 | +- Optimized some memory transfers. [92d35cd1] |
| 101 | +- Enabled event caching in Level Zero PI plugin. [a41b33c3] |
| 102 | +- Optimized some reductions with `parallel_for` accepting `sycl::range` |
| 103 | + for discrete GPUs. [c22a5d3f] |
| 104 | +- Improved performance of event synchronization on CUDA backend. [c4f326aa] |
| 105 | +- Added ability to use descendent devices of context members within that |
| 106 | + context. Not supported with OpenCL backend yet. [a0c8c503] [78a483c7] |
| 107 | +- Added support for querying `atomic64` device capability with HIP backend. [cb190fc2] |
| 108 | +- Enabled FTZ operations for CUDA/PTX backend via |
| 109 | + `-fcuda-flush-denormals-to-zero`. [e8e7ae83] |
| 110 | +- Improved error message about incorrect kernel argument types with CUDA backend. [2542e6a8] |
| 111 | +- Limited allowed argument types for `rol/ror` ESIMD functions to better |
| 112 | + represent HW capabilities. [b05f256c] |
| 113 | +- Implemented `mem_advise` reset and managed memory checks for CUDA backend. [fe18839c] |
| 114 | +- Added concurrent memory check to `mem_advise` on CUDA backend. [33746d8d] |
| 115 | +- Enabled multiple HIP streams per SYCL queue. [e0c40a9f] |
| 116 | +- Implemented lazy mechanism of setting context for default-constructed events. [ed92c4ca] |
| 117 | +- Improved performance for multi-dimensional accessors with multiple accesses |
| 118 | + in a kernel. [7c58b9a6] |
| 119 | + |
| 120 | +### SYCL Compiler |
| 121 | + |
| 122 | +- Increased max `_Bitint` size to 4096 for FPGA target. [db5f72a8] [3f06cad0] |
| 123 | +- Removed deprecation message for `[[intel::disable_loop_pipelining]]` attribute. [07201f56] |
| 124 | +- Allowed `__builtin_assume_aligned` to be called from device code. [24937eac] |
| 125 | +- Improved link step performance when `per_kernel` device code split is used. [84de9d6d] |
| 126 | +- Added support for `SYCL_EXTERNAL` on `device_global` variables. [8b958f67] |
| 127 | +- Updated `__builtin_intel_fpga_mem` to accept more parameters. [231338dc] |
| 128 | +- Updated `ivdep` attribute to allow `safelen = 0`. [558b3ba4] |
| 129 | +- Improved linking with `sycl.lib` on Windows. [404d2816] |
| 130 | +- Implemented more diagnostics about incorrect `device_global` usages. [12657218] |
| 131 | +- Improved library resolution for `libsycl.so`. [4ce19d69] |
| 132 | +- Improved diagnostics when linking with mismatched objects. [0e0202ee] |
| 133 | +- Added a warning for floating-point size changes after implicit conversions. [e4f5d55f] |
| 134 | +- Made `invoke_simd` convert its argument to appropriate types. [038764fd] |
| 135 | + |
| 136 | +### Documentation |
| 137 | + |
| 138 | +- Removed explicit `cl` namespace references. [433ea5c7] |
| 139 | +- Added a short guideline on using CMake with SYCL compiler. [fa603c3e] |
| 140 | + |
| 141 | +## Bug fixes |
| 142 | + |
| 143 | +### SYCL Library |
| 144 | + |
| 145 | +- Fixed a compilation issue where it wasn't possible to pass an initializer list |
| 146 | + for dependency events vector in `queue` shortcuts with `offset` |
| 147 | + parameter. [f4f83d95] |
| 148 | +- Fixed `sycl::get_pointer_device` throwing an exception when it passed a |
| 149 | + descendent device (sub-device) instead of a root device. [26d5d98b] |
| 150 | +- Fixed memory leak happening when kernel bundles are linked. [980677d9] |
| 151 | +- Fixed USM free throwing an exception when it passed a context created for |
| 152 | + a descendent device. [c49d4944] |
| 153 | +- Fixed `accessor`'s CTAD for `g++` host compiler. [57aabe7e] |
| 154 | +- Fixed a compilation issue when using multi-dimensional `accessor`'s subscript |
| 155 | + operator. [22e3fc56] |
| 156 | +- Fixed "definition with the same mangled name" error happening when used |
| 157 | + multiple buffer reductions in a kernel. [a0a4d721] |
| 158 | +- Fixed a compilation issue with SYCL math built-ins when GCC < 11.1 is used as |
| 159 | + a host compiler. [c786894f] |
| 160 | +- Fixed a compilation issue with SYCL math built-ins (such as `sycl::modf`, |
| 161 | + for example) not accepting pointers to `half`. [e2861665] |
| 162 | +- Fixed an issues with `reduction`s when MSVC is used as host compiler. [94c4b80a] |
| 163 | +- Fixed a compilation issue when fully specialized `sycl::span` is initialized |
| 164 | + from an array. [2b50820b] |
| 165 | +- Fixed a crash in Level Zero PI plugins caused by specialization constants not |
| 166 | + being used on device side, but present in a program. [9500875f] |
| 167 | +- Fixed event leak in Level Zero plugin. [6d04aa64] |
| 168 | +- Fixed an issue with sub-sub-devices in Level Zero plugin. [4b1b01bd] |
| 169 | +- Fixed an issue with incorrect `half` conversion on ESIMD emulator. [6143e55a] |
| 170 | +- Fixed a compilation issue with `abs` ESIMD function. [c72a85dd] |
| 171 | +- Fixed some warning coming out of SYCL headers when compiled in C++20 mode. [12ac4c36] |
| 172 | +- Fixed a compilation issue when using multiple bitwise shift operations |
| 173 | + in ESIMD. [40d08c23] |
| 174 | +- Fixed a crash in Level Zero PI plugin which occurs when the runtime tries to reset |
| 175 | + a command list which does not have a synchronization fence associated with it. [a61ac7a0] |
| 176 | +- Fixed a performance issue with excessive streams synchronization |
| 177 | + on CUDA backend. [5352b423] |
| 178 | +- Fixed a compilation issue with |
| 179 | + `sycl::get_native<sycl::backend::ext_oneapi_cuda>(sycl::device)` free |
| 180 | + function (intel/llvm#6653). [4d69c297] |
| 181 | +- Fixed synchronization issue for explicit dependencies (`depends_on` usage) |
| 182 | + which is blocked by host task or host accessor. [346a6c53] |
| 183 | +- Fixed an issue in Level Zero plugin which could cause barriers to not be |
| 184 | + correctly applied for an entire queue. [d01371b3] |
| 185 | +- Fixed a synchronization issue on CUDA backend. [e848c15f] |
| 186 | +- Fixed an issue with a context not properly retained with even interop on |
| 187 | + CUDA backend. [2baf1de5] |
| 188 | +- Fixed `accessor` so gdb can parse its template parameters correctly. [372cc948] |
| 189 | +- Fixed uses of common macro names in the implementation's header files. [e87adfd2] |
| 190 | +- Fixed a performance regression in Level Zero backend related to command list. [8a4777d0] |
| 191 | + |
| 192 | +### SYCL Compiler |
| 193 | + |
| 194 | +- Fixed cleanup of temporary files produced by unbundling archives. [7ef4b0a3] |
| 195 | +- Fixed optimizing out `device_global` variables with internal linkage. [5b2cfe21] |
| 196 | +- Fixed an issue when compiling and linking with different optimization levels |
| 197 | + could cause runtime errors. [0cc7540e] |
| 198 | +- Fixed description of `-f[no-]sycl-unnamed-lambda` compiler option. [616ecf75] |
| 199 | +- Fixed an issue when building SYCL programs in Debug mode with |
| 200 | + `Windows-Clang.cmake`. [084f34c1] |
| 201 | +- Fixed CUDA fat-binaries filename extensions in SYCL toolchain |
| 202 | + (`.o` -> `.cubin`). [2c217cfd] |
| 203 | +- Fixed an issue causing incorrect conversions involving unsigned types in |
| 204 | + ESIMD. [2a1151b8] |
| 205 | +- Fixed a crash in applications containing a mix of unnamed ESIMD and non-ESIMD |
| 206 | + kernels. [3d54d7ef] |
| 207 | +- Fixed type casting of image coordinates in built-ins implementation for CUDA |
| 208 | + backend. [338b4edc] |
| 209 | +- Fixed a linking issue when targeting AMD/HIP. [9829897a] |
| 210 | +- Fixed an issue when `op[]` was called with typedef argument under gdb. [b7bfe391] |
| 211 | + |
| 212 | +## API/ABI breakages |
| 213 | + |
| 214 | +- Removed deprecated `kernel::get_work_group_info` [94883470] |
| 215 | +- Removed deprecated `get_native` class method [282e7744] |
| 216 | +- Removed support for `intel::fpga_pipeline` attribute [6fe90d20] |
| 217 | +- Added `MAJOR_VERSION` to the name of the SYCL library on Windows. [c8427d6d] |
| 218 | +- Removed `sycl::program` class. [85a6833d] |
| 219 | +- Removed `ext::oneapi::reduction`. [7fa607fb] |
| 220 | +- Removed deprecated `address_space` enum values. [351b123a] |
| 221 | +- Removed `event::get` method. [da296ba3] |
| 222 | +- Removed `using namespace experimental` inside `ext::intel`. [7ddd8a97] |
| 223 | +- Made intel-specific device info descriptors namespace qualified. [0f4a0f3b] |
| 224 | +- Removed deprecated `make_queue` API. [26b7762a] |
| 225 | +- Aligned return types of `sycl::get_native` and `interop::get_native_mem` |
| 226 | + functions to be conformance with SYCL 2020 spec. [78275902] |
| 227 | +- Aligned `sycl::buffer_allocator` interface with SYCL 2020 spec. [78275902] |
| 228 | +- Removed `cl` namespace from `sycl/sycl.hpp` header. [3d2b25e1] |
| 229 | +- Dropped support for compiling SYCL in less than C++17 mode. [43e713c3] |
| 230 | +- Many other ABI-breaking changes resulting from internal refactoring |
| 231 | + |
| 232 | +# Known issues |
| 233 | + |
| 234 | +- This release is not backwards compatible with previous releases, which means |
| 235 | + that existing SYCL applications won't work with the never runtime without |
| 236 | + re-compilation. |
| 237 | +- Having MESA OpenCL implementation which provides no devices on a |
| 238 | + system may cause incorrect device discovery. As a workaround such an OpenCL |
| 239 | + implementation can be disabled by removing `/etc/OpenCL/vendor/mesa.icd`. |
| 240 | +- Compilation may fail on Windows in debug mode if a kernel uses |
| 241 | + `std::array`. This happens because debug version of `std::array` in |
| 242 | + Microsoft STL C++ headers calls functions that are illegal for the device |
| 243 | + code. As a workaround the following can be done: |
| 244 | + 1. Dump compiler pipeline execution strings by passing `-###` option to the |
| 245 | + compiler. The compiler will print the internal execution strings of |
| 246 | + compilation tools. The actual compilation will not happen. |
| 247 | + 2. Modify the (usually) first execution string (it should have |
| 248 | + `-fsycl-is-device` option) by adding |
| 249 | + `-D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0` options to the |
| 250 | + end of the string. Execute all string one by one. |
| 251 | +- `-fsycl-dead-args-optimization` can't help eliminate offset of |
| 252 | + accessor even though it's created with no offset specified |
| 253 | +- SYCL 2020 barriers show worse performance than SYCL 1.2.1 do. [18c80faa] |
| 254 | +- When using fallback assert in separate compilation flow it requires explicit |
| 255 | + linking against `lib/libsycl-fallback-cassert.o` or |
| 256 | + `lib/libsycl-fallback-cassert.spv` |
| 257 | +- Limit alignment of allocation requests at 64KB which is the only alignment |
| 258 | + supported by Level Zero. 7dfaf3bd |
| 259 | +- On the following scenario on Level Zero backend: |
| 260 | + 1. Kernel A, which uses buffer A, is submitted to queue A. |
| 261 | + 2. Kernel B, which uses buffer B, is submitted to queue B. |
| 262 | + 3. `queueA.wait()`. |
| 263 | + 4. `queueB.wait()`. |
| 264 | + DPCPP runtime used to treat unmap/write commands for buffer A/B as host |
| 265 | + dependencies (i.e. they were waited for prior to enqueueing any command |
| 266 | + that's dependent on them). This allowed Level Zero plugin to detect that |
| 267 | + each queue is idle on steps 1/2 and submit the command list right away. |
| 268 | + This is no longer the case since we started passing these dependencies in an |
| 269 | + event waitlist and Level Zero plugin attempts to batch these commands, so |
| 270 | + the execution of kernel B starts only on step 4. The workaround restores the |
| 271 | + old behavior in this case until this is resolved. [2023e10d] [6c137f87] |
| 272 | +- User-defined functions with the name and signature matching those of any |
| 273 | + OpenCL C built-in function (i.e. an exact match of arguments, return type |
| 274 | + doesn't matter) can lead to Undefined Behavior. |
| 275 | +- A DPC++ system that has FPGAs installed does not support multi-process |
| 276 | + execution. Creating a context opens the device associated with the context |
| 277 | + and places a lock on it for that process. No other process may use that |
| 278 | + device. Some queries about the device through `device.get_info<>()` also |
| 279 | + open up the device and lock it to that process since the runtime needs |
| 280 | + to query the actual device to obtain that information. |
| 281 | +- The format of the object files produced by the compiler can change between |
| 282 | + versions. The workaround is to rebuild the application. |
| 283 | +- Using `sycl::kernel_bundle` API to refer to a kernel defined |
| 284 | + in another translation unit leads to undefined behavior |
| 285 | +- Linkage errors with the following message: |
| 286 | + `error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined` |
| 287 | + can happen when a SYCL application is built using MS Visual Studio 2019 |
| 288 | + version below 16.3.0 and user specifies `-std=c++14` or `/std:c++14`. |
| 289 | +- Printing internal defines isn't supported on Windows. [50628db1] |
| 290 | + |
1 | 291 | # June'22 release notes
|
2 | 292 |
|
3 | 293 | Release notes for commit range f34ba2c..4043dda
|
|
0 commit comments