Skip to content

oneAPI DPC++ Compiler 2022-09

Compare
Choose a tag to compare
@tfzhu tfzhu released this 21 Oct 06:23
· 125632 commits to sycl since this release
0f579ba

New features

SYCL Compiler

  • Added ability to enforce stateless memory accesses for ESIMD. [1811162]
  • Added support for -fsycl-force-target compiler option. [1d95f2e]
  • Added support for [[intel::max_reinvocation_delay]] loop attribute. [90fa5bb]
  • Added support for -fsycl-huge-device-code compiler option, which allows
    linking object files larger than 2GB. [f963062]
  • Added support for compiling .cu files with SYCL compiler. [e76ad72]
  • Added support for assert on HIP backend. [ade1870]
  • Enabled CXX standard library functions for CUDA backend. [1fe92c5]
  • Implemented group collective built-in functions for more integral types. [d4933b6]

SYCL Library

  • Implemented SYCL 2020 callable device selectors. [64f0db7]
  • Implemented SYCL 2020 standalone device selectors. [bfc7e98]
  • Added SYCL 2020 property interfaces for local_accessor, usm_allocator,
    accessor and host_accessor classes. [1136b40] [da7dcf8]
  • Added support for fpga_simulator_selector. [9bef890]
  • Added support for local_accessor. Deprecated target::local. [e4423ef]
  • Added support for querying free device memory on Level Zero backend. [0eeef2b]
  • Added support for querying free device memory on CUDA and HIP backends. [436f0d8]
  • Implemented bfloat16 conversions from/to float for host. [2a383f1]
  • Added support for ext::oneapi::property::queue::discard_events to
    Level Zero PI plugin. [1372120]
  • Added lsc_atomic support on ESIMD emulator. [0c051a8]
  • Added dpas support on ESIMD emulator. [3d506a3]
  • Added C++ API for imf libdevice built-ins. [830916a]
  • Implemented make_queue for CUDA backend. [89460e8]
  • Implemented has_native_event and make_event for CUDA backend. [74369c8]
  • Added support of CUDA XPTI tracing. [0cd0414]
  • Introduced predicates for ESIMD lsc_block_store/load. [f44edce]
  • Added experimental set_kernel_properties API and use_double_grf property
    for ESIMD. [9a55da5]
  • Added "eager initialization" mode to Level Zero PI plugin. It might result
    in an unnecessary work done by the plugin, but ensures the fastest possible
    execution on hot and reportable paths. [c145959]
  • Added full support of element wise operations on joint_matrix on CUDA
    backend including bfloat16 support. [0a1d751]
  • Implemented group::get_linear_id(int) method [6e83c12]

Documentation

Improvements

SYCL Library

  • Ensured that a correct errc thrown for an unassociated placeholder
    accessor. [4f9935a]
  • Removed dependency on OpenCL ICD Loader from the runtime. [90e8b5e]
  • Added support for ZEBIN format to persistent caching mechanism. [34dcf83]
  • Added identification mechanism for binaries in newer ZEBIN format. [f4dee54]
  • Switched to use struct information descriptors in accordance with SYCL 2020.
    Removed some deprecated information queries. [b3cbda5]
  • Updated kernel_device_specific::max_sub_group_size query to match SYCL 2020
    spec. Deprecated the old variant. [7842d05]
  • Deprecated SYCL 1.2.1 device selectors. [c058380]
  • Improved error messages reported for unsupported device partitioning. [1c9ddba]
  • Made device and platform default to default_selector_v. [b32dd41]
  • Deprecated address_space::constant_space. [351b123]
  • Marked sycl::exception::has_context as noexcept. [ad923c9]
  • Improved range reductions performance on CPU. [3323da6]
  • Made sycl::exception nothrow copy constructible. [289e33d]
  • Marked has_property methods as noexcept. [417b5a2]
  • Improved sycl::event::get_profiling_info exception message when event is
    default constructed. [2e86cd4]
  • Added a diagnostic (in form of static_assert) about kernel lambda size
    mismatch between host and device. [d278c67] [ec179b7] [f417a88]
  • Updated pipes class to throw exceptions if used on host. [eab2969]
  • Updated ESIMD Emulator PI plugin to report support for cl_khr_fp64
    extension. [398571a]
  • Updated Level Zero plugin to prefer copy engine for memory read/write
    operations. [65c3ea2]
  • Optimized some memory transfers. [92d35cd]
  • Enabled event caching in Level Zero PI plugin. [a41b33c]
  • Optimized some reductions with parallel_for accepting sycl::range
    for discrete GPUs. [c22a5d3]
  • Improved performance of event synchronization on CUDA backend. [c4f326a]
  • Added ability to use descendent devices of context members within that
    context. Not supported with OpenCL backend yet. [a0c8c50] [78a483c]
  • Added support for querying atomic64 device capability with HIP backend. [cb190fc]
  • Enabled FTZ operations for CUDA/PTX backend via
    -fcuda-flush-denormals-to-zero. [e8e7ae8]
  • Improved error message about incorrect kernel argument types with CUDA backend. [2542e6a]
  • Limited allowed argument types for rol/ror ESIMD functions to better
    represent HW capabilities. [b05f256]
  • Implemented mem_advise reset and managed memory checks for CUDA backend. [fe18839]
  • Added concurrent memory check to mem_advise on CUDA backend. [33746d8]
  • Enabled multiple HIP streams per SYCL queue. [e0c40a9]
  • Implemented lazy mechanism of setting context for default-constructed events. [ed92c4c]
  • Improved performance for multi-dimensional accessors with multiple accesses
    in a kernel. [7c58b9a]

SYCL Compiler

  • Increased max _Bitint size to 4096 for FPGA target. [db5f72a] [3f06cad]
  • Removed deprecation message for [[intel::disable_loop_pipelining]] attribute. [07201f5]
  • Allowed __builtin_assume_aligned to be called from device code. [24937ea]
  • Improved link step performance when per_kernel device code split is used. [84de9d6]
  • Added support for SYCL_EXTERNAL on device_global variables. [8b958f6]
  • Updated __builtin_intel_fpga_mem to accept more parameters. [231338d]
  • Updated ivdep attribute to allow safelen = 0. [558b3ba]
  • Improved linking with sycl.lib on Windows. [404d281]
  • Implemented more diagnostics about incorrect device_global usages. [1265721]
  • Improved library resolution for libsycl.so. [4ce19d6]
  • Improved diagnostics when linking with mismatched objects. [0e0202e]
  • Added a warning for floating-point size changes after implicit conversions. [e4f5d55]
  • Made invoke_simd convert its argument to appropriate types. [038764f]

Documentation

  • Removed explicit cl namespace references. [433ea5c]
  • Added a short guideline on using CMake with SYCL compiler. [fa603c3]

Bug fixes

SYCL Library

  • Fixed a compilation issue where it wasn't possible to pass an initializer list
    for dependency events vector in queue shortcuts with offset
    parameter. [f4f83d9]
  • Fixed sycl::get_pointer_device throwing an exception when it passed a
    descendent device (sub-device) instead of a root device. [26d5d98]
  • Fixed memory leak happening when kernel bundles are linked. [980677d]
  • Fixed USM free throwing an exception when it passed a context created for
    a descendent device. [c49d494]
  • Fixed accessor's CTAD for g++ host compiler. [57aabe7]
  • Fixed a compilation issue when using multi-dimensional accessor's subscript
    operator. [22e3fc5]
  • Fixed "definition with the same mangled name" error happening when used
    multiple buffer reductions in a kernel. [a0a4d72]
  • Fixed a compilation issue with SYCL math built-ins when GCC < 11.1 is used as
    a host compiler. [c786894]
  • Fixed a compilation issue with SYCL math built-ins (such as sycl::modf,
    for example) not accepting pointers to half. [e286166]
  • Fixed an issues with reductions when MSVC is used as host compiler. [94c4b80]
  • Fixed a compilation issue when fully specialized sycl::span is initialized
    from an array. [2b50820]
  • Fixed a crash in Level Zero PI plugins caused by specialization constants not
    being used on device side, but present in a program. [9500875]
  • Fixed event leak in Level Zero plugin. [6d04aa6]
  • Fixed an issue with sub-sub-devices in Level Zero plugin. [4b1b01b]
  • Fixed an issue with incorrect half conversion on ESIMD emulator. [6143e55]
  • Fixed a compilation issue with abs ESIMD function. [c72a85d]
  • Fixed some warning coming out of SYCL headers when compiled in C++20 mode. [12ac4c3]
  • Fixed a compilation issue when using multiple bitwise shift operations
    in ESIMD. [40d08c2]
  • Fixed a crash in Level Zero PI plugin which occurs when the runtime tries to reset
    a command list which does not have a synchronization fence associated with it. [a61ac7a]
  • Fixed a performance issue with excessive streams synchronization
    on CUDA backend. [5352b42]
  • Fixed a compilation issue with
    sycl::get_native<sycl::backend::ext_oneapi_cuda>(sycl::device) free
    function (#6653). [4d69c29]
  • Fixed synchronization issue for explicit dependencies (depends_on usage)
    which is blocked by host task or host accessor. [346a6c5]
  • Fixed an issue in Level Zero plugin which could cause barriers to not be
    correctly applied for an entire queue. [d01371b]
  • Fixed a synchronization issue on CUDA backend. [e848c15]
  • Fixed an issue with a context not properly retained with even interop on
    CUDA backend. [2baf1de]
  • Fixed accessor so gdb can parse its template parameters correctly. [372cc94]
  • Fixed uses of common macro names in the implementation's header files. [e87adfd]
  • Fixed a performance regression in Level Zero backend related to command list. [8a4777d]

SYCL Compiler

  • Fixed cleanup of temporary files produced by unbundling archives. [7ef4b0a]
  • Fixed optimizing out device_global variables with internal linkage. [5b2cfe2]
  • Fixed an issue when compiling and linking with different optimization levels
    could cause runtime errors. [0cc7540]
  • Fixed description of -f[no-]sycl-unnamed-lambda compiler option. [616ecf7]
  • Fixed an issue when building SYCL programs in Debug mode with
    Windows-Clang.cmake. [084f34c]
  • Fixed CUDA fat-binaries filename extensions in SYCL toolchain
    (.o -> .cubin). [2c217cf]
  • Fixed an issue causing incorrect conversions involving unsigned types in
    ESIMD. [2a1151b]
  • Fixed a crash in applications containing a mix of unnamed ESIMD and non-ESIMD
    kernels. [3d54d7e]
  • Fixed type casting of image coordinates in built-ins implementation for CUDA
    backend. [338b4ed]
  • Fixed a linking issue when targeting AMD/HIP. [9829897]
  • Fixed an issue when op[] was called with typedef argument under gdb. [b7bfe39]

API/ABI breakages

  • Removed deprecated kernel::get_work_group_info [9488347]
  • Removed deprecated get_native class method [282e774]
  • Removed support for intel::fpga_pipeline attribute [6fe90d2]
  • Added MAJOR_VERSION to the name of the SYCL library on Windows. [c8427d6]
  • Removed sycl::program class. [85a6833]
  • Removed ext::oneapi::reduction. [7fa607f]
  • Removed deprecated address_space enum values. [351b123]
  • Removed event::get method. [da296ba]
  • Removed using namespace experimental inside ext::intel. [7ddd8a9]
  • Made intel-specific device info descriptors namespace qualified. [0f4a0f3]
  • Removed deprecated make_queue API. [26b7762]
  • Aligned return types of sycl::get_native and interop::get_native_mem
    functions to be conformance with SYCL 2020 spec. [7827590]
  • Aligned sycl::buffer_allocator interface with SYCL 2020 spec. [7827590]
  • Removed cl namespace from sycl/sycl.hpp header. [3d2b25e]
  • Dropped support for compiling SYCL in less than C++17 mode. [43e713c]
  • Many other ABI-breaking changes resulting from internal refactoring

Known issues

  • This release is not backwards compatible with previous releases, which means
    that existing SYCL applications won't work with the never runtime without
    re-compilation.
  • Having MESA OpenCL implementation which provides no devices on a
    system may cause incorrect device discovery. As a workaround such an OpenCL
    implementation can be disabled by removing /etc/OpenCL/vendor/mesa.icd.
  • Compilation may fail on Windows in debug mode if a kernel uses
    std::array. This happens because debug version of std::array in
    Microsoft STL C++ headers calls functions that are illegal for the device
    code. As a workaround the following can be done:
    1. Dump compiler pipeline execution strings by passing -### option to the
      compiler. The compiler will print the internal execution strings of
      compilation tools. The actual compilation will not happen.
    2. Modify the (usually) first execution string (it should have
      -fsycl-is-device option) by adding
      -D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0 options to the
      end of the string. Execute all string one by one.
  • -fsycl-dead-args-optimization can't help eliminate offset of
    accessor even though it's created with no offset specified
  • SYCL 2020 barriers show worse performance than SYCL 1.2.1 do. [18c80fa]
  • When using fallback assert in separate compilation flow it requires explicit
    linking against lib/libsycl-fallback-cassert.o or
    lib/libsycl-fallback-cassert.spv
  • Limit alignment of allocation requests at 64KB which is the only alignment
    supported by Level Zero. 7dfaf3b
  • On the following scenario on Level Zero backend:
    1. Kernel A, which uses buffer A, is submitted to queue A.
    2. Kernel B, which uses buffer B, is submitted to queue B.
    3. queueA.wait().
    4. queueB.wait().
      DPCPP runtime used to treat unmap/write commands for buffer A/B as host
      dependencies (i.e. they were waited for prior to enqueueing any command
      that's dependent on them). This allowed Level Zero plugin to detect that
      each queue is idle on steps 1/2 and submit the command list right away.
      This is no longer the case since we started passing these dependencies in an
      event waitlist and Level Zero plugin attempts to batch these commands, so
      the execution of kernel B starts only on step 4. The workaround restores the
      old behavior in this case until this is resolved. [2023e10] [6c137f8]
  • User-defined functions with the name and signature matching those of any
    OpenCL C built-in function (i.e. an exact match of arguments, return type
    doesn't matter) can lead to Undefined Behavior.
  • A DPC++ system that has FPGAs installed does not support multi-process
    execution. Creating a context opens the device associated with the context
    and places a lock on it for that process. No other process may use that
    device. Some queries about the device through device.get_info<>() also
    open up the device and lock it to that process since the runtime needs
    to query the actual device to obtain that information.
  • The format of the object files produced by the compiler can change between
    versions. The workaround is to rebuild the application.
  • Using sycl::kernel_bundle API to refer to a kernel defined
    in another translation unit leads to undefined behavior
  • Linkage errors with the following message:
    error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
    can happen when a SYCL application is built using MS Visual Studio 2019
    version below 16.3.0 and user specifies -std=c++14 or /std:c++14.
  • Printing internal defines isn't supported on Windows. [50628db]