Skip to content

oneAPI DPC++ Compiler 2022-06

Compare
Choose a tag to compare
@tfzhu tfzhu released this 28 Jun 06:26
· 135233 commits to sycl since this release
4043dda

New features

SYCL Compiler

  • Added -fcuda-prec-sqrt frontend compiler option which enables higher presision version of sqrt in the device code [ebf9ea8]
  • Added support for local memory accessors for the HIP backend. [58508ba]
  • Added initial support of -lname processing when searching for fat static libraries. [35e32d8] [a33f9c8]
  • Added -fsycl-fp32-prec-sqrt flag which enables correctly rounded sycl::sqrt. [5c8b7e7]
  • Added support for [[intel::loop_count()]] attribute. [c536e76]
  • Added support for passing driver options to JIT compiler and linker. [1c93bfe]
  • Added default argument support for work_group_size_hint attribute. [0cff80e]
  • Added support for float and double exchange and compare exchange atomic operations in CUDA libclc. [1d84c99]
  • Added --ffast-math support for CUDA libclc. [0f0c5d1]
  • Added support for software atomics (except for the ones using system scope) for lower sm versions of CUDA architecture. Enabled SYCL_USE_NATIVE_FP_ATOMICS by default. [7bc8447]
  • Added support for the global offset for AMDGPU. [2dc3c06]
  • Added support for asynchronous barrier for CUDA backend sm 80+. [6770421]
  • Added -f[no-]sycl-device-lib-jit-link option to control JIT linking of SYCL device libraries. [dfb37a8] [c946286]
  • Added support for the new FPGA attribute [[intel::fpga_pipeline(N)]] for loop pipelining. [92aadf3]
  • Added assert support for Windows NVPTX. [f29b498]
  • Added support for sycl_ext_oneapi_properties extension. [87f60f6][1984e74][a2583ec][cdf561a][d2982c6][35c2e00]

SYCL Library

  • Added support for Nvidia MMA for bf16, mixed precision int ((u)int8/int32), and mixed precision float (half/float). [5373362]
  • Added a mode for the Level Zero plugin where only last command in each batch yields a host-visible event. Enabled this mode by default. [c6b7b8e]
  • Added an option to query for atomic scope capabilities for the CUDA backend. Updated returns for atomics memory order capabilties. [43a4192]
  • Added support for an experimental Level Zero API for host pointer import into USM. The feature can be enabled using SYCL_USM_HOSTPTR_IMPORT environment variable. [844d7b6]
  • Added support for the wi_element for bf16 type. [9f2b7bd]
  • Added complex support for the reduce and scan group algorithms. [90a4dc7]
  • Added support for SYCL 2020 methods in the group class. [73d59ce]
  • Added SYCL_RT_WARNING_LEVEL environment variable which allows to control amount of warnings and performance hints the runtime library may print. [2741010]
  • Added tanh (for floats/halfs) and exp2 (for halfs) native definitions for CUDA backend. [250c498]
  • Added bf16 builtins for fma, fmin, fmax and fmax on CUDA backend. [62651dd]
  • Added support for USM buffer location properties which allows to specify at what memory location the device usm allocation should be in. [12c988a]
  • Added support for buffer_location property to the sycl::buffer. [9808525]
  • Added single_task support for ESIMD_EMULATOR backend. [2331160]
  • Added support for SVM 1,2,4-elements gather/scatter for ESIMD. [e200720]
  • Added support for bf16 builtins operating on storage types for CUDA backend. [413a9ef]
  • Added support for backend_version device property for CUDA backend. [4b1a4bc]
  • Added support for round-robin submissions to multiple compute CCS for the Level Zero backend. Disabled by default, can be controlled using SYCL_PI_LEVEL_ZERO_USE_COMPUTE_ENGINE. [a836c87]
  • Added support for buffer migration for contexts with multiple devices in the Level Zero plugin. [7baf152]
  • Added mode where the Level Zero plugin uses immediate command-lists instead of standard command-lists. This mode is disabled by default, can be enabled using SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS environment variable. [b9cb1d1]
  • Added support for sycl::get_native(sycl::buffer) for OpenCL and CUDA backends. [8b3c8c4]
  • Added reduction overloads accepting span. [863383b]
  • Added LSC support for ESIMD_EMULATOR backend. [b78bf00]
  • Added half type support for __esimd_convertvector_to/from. [0bfffd6]
  • Added buffer_allocator SYCL 2020 conformant variant. [53430c8]
  • Added support for the USM buffer location property in malloc_shared. [6e89821] [9f61c8e][8c4d9a5]
  • Added support for the USM buffer location property in malloc_host. [2c7caab]
  • Added experimental context and device interoperability support for CUDA. [f0df89a]
  • Added support for memory intrinsics for the ESIMD_EMULATOR plugin. [1a8f501]
  • Added support for named barrier APIs for ESIMD. [1df0038]
  • Added support for DPAS API for ESIMD. [5881938]
  • Added support for LSC memory access APIs for ESIMD. [4bd50e7]
  • Added support for the invoke_simd feature. [4072557][8471ff3][8c7bb45][62afb59][3e1c1bf]
  • Added support for info::device::atomic64 for OpenCL and Level Zero backends. [8feb558]
  • Added support for sycl_ext_oneapi_usm_device_read_only extension [644c614][58c9d3a]
  • Added support for mapping/unmapping operations for ESIMD_EMULATOR plugin. [bc0579a]
  • Added support for make_buffer API for the Level Zero backend. [7c49984]
  • Added interoperability support for HIP backend. [e06d1b5]
  • Added missing +-*/ operations for half. [059efbc]
  • Introduced new environment variable SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ to control the max local memory allowed to be allocated per kernel on CUDA backend. [2e24304]
  • Added ext_intel_global_host_space in accordance with sycl_ext_intel_usm_address_spaces extension. [7a2f44b]
  • Added aspect for bfloat16. [f84fc32]
  • Introduced "Intel math functions" device library with support of type cast util functions for float, double and integer type. [a310952]
  • Added bfloat16 support for joint_matrix [6ac62ab]

Documentation

Tools

  • Implemented property set generation for device globals in the sycl-post-link. Added the --device-globals command-line argument for lowering and generating information about device global variables. [88123c1]
  • Introduced XPTI-based tools for SYCL applications: sycl-trace, sycl-prof, sycl-sanitize. [789d138]
  • Add support for tracing Level Zero API calls using XPTI and updated sycl-trace tool to be able to display both PI and Level Zero calls simultaneously. [fc9cf52]

Improvements

SYCL Compiler

  • Added a diagnostic on attempt to use zero length arrays in the device code [52e8f58]
  • Added support for consuming fat objects containing SPIR-V [c878063]
  • Added support for generating SPRIV based fat objects [1e94ef3]
  • Added support for group collective functions for HIP backend [106882c]
  • Added a diagnostic on attempt to use -fsycl and -static-libstdc++ together. This combination is not supported due to the runtime dependence with libsycl.so [bb0055c]
  • Added support for atomic loads and stores with various memory orders and scopes [e15ac50] [6b2635e]
  • Improved performance of accessing memory pointed by sycl::accessor for FPGA device [fbab374]
  • Added support for setting CUDA installation directory using CUDA_PATH environment variable [b0c145a]
  • Improved deferred diagnostics for usages within function templates in device code. [d0efca5]
  • Added support for sycl_special_class attribute to mark SYCL classes/struct that need the additional compiler handling. [8ba9c79]
  • Improved driver to do device section checking only when offloading is enabled. [3742b93]
  • Allowed calls to constant expression function pointers in device code. [e84c952]
  • Disabled the passing code coverage and profiling options to device compilation.
  • Added clang support of code location information for kernels. [96d2e17]
  • Disallowed explicit casts between mismatching address spaces. [1cee960]
  • Added support of [[sycl::device_has]] attribute on kernel. [aa2162c]
  • Added a warning on explicit cast from default address space to named. [9adb25b]
  • Added a warning for converting 'c' input to 'c++' in SYCL mode. [5b62ee0]
  • Silenced unknown attribute warnings on host compilation. [2d359df]
  • Added a diagnostic on attemp to use accessor::operator[] in ESIMD code. [9d7a651]
  • Expanded driver's ability to discover fat static archives after /link option on Windows. [271ef40]
  • Added support for saving user specified names for lambda captures in kernel lambda object for FPGA target. [af29982] [5ffb2ee]
  • Adjusted the compilation when preprocessing is requested to allow for the device compilation to fail and continue to perform the preprocessing steps. [7f2e99c]
  • Added the ability to detect a kernel size mismatch in the case when the host and device compilers are different. [ef90e6a]
  • Improved handling of specialization constants by backends. [e62b5aa]
  • Improved support of -mlong-double options. [6083920]
  • Improved -save-temps to allow optimization when performing a SYCL device compilation. [05fe5ae]
  • Removed warning diagnostic on host compilation when using __attribute__((sycl_device)). [49e595e]
  • Introduceed multiple streams in each queue for CUDA backend to improve concurrent execution. [dd41845]
  • Improved compiler to collect information for optimization record only if optmization record is saved by user (i.e. -fsave-optimization-record or -opt-record-file is passed). [cb94c80]
  • Improved [[intel::max_concurrency()]] attribute support. [b6aa4cc]
  • Added the new kernel_arg_exclusive_ptr metadata which guarantees that the kernel pointer argument, or pointers that derive from it, will not be dereferenced outside current invocation of the kernel. [e03c4ed]
  • Added a warning for the case when invalid subgroup size is used on kernel for CUDA backend. [6dab69f]
  • Improved deprecation messaging for options. [f0b65a1]
  • Improved diagnostic behavior for -fsanitize with -fsycl. [9397cbc]

SYCL Library

  • Added support for sycl::ctz API [d5eb769]
  • Improved the diagnostic for unresolved symbols in the device code for Level Zero backend [33cfb9f]
  • Added several arithmetic operations for sycl::ext::oneapi::experimental::wi_element [fe2e35e]
  • Added support for sycl::queue::submit_barrier for HIP backend [53f1cce]
  • Added sycl::property_list APIs to sycl::stream [4b39793]
  • Defined sycl::access::decorated in the SYCL headers. [d183807]
  • Improved performance by allowing batching for wait with a barrier commands for Level Zero backend [bd1ed6a]
  • Disabled image support for CUDA backend since the support is not complete [90c8f05]
  • Avoided JITing unnecessary device code when using sycl::handler::set_specialization_constant [5746906]
  • Updated image accessor constructor to make it possible to use const references in parallel_for. [3070b95]
  • Relax the mutex lock duration in queue finish for the Level Zero backend to allow working with the queue from other threads. [8573935]
  • Added XPTI instrumentation for USM allocations. [7df3923]
  • Extended XPTI information with buffer constructor data. [fc0d28a]
  • Added error handling for sycl::event::get_profiling_info(). [ec74a5c]
  • Eliminated recursion and duplicated dependencies in leaf buffers handling in the scheduler. [6f40376]
  • Improved runtime to emit program build logs when SYCL_RT_WARNING_LEVEL is set to 2 or higher. [008519a]
  • Improved the error message at command execution failure. [930ecbf]
  • Improved runtime to build program for root device only and re-use the binary for sub-devices to avoid "duplicate" builds. [13a7455]
  • Improved sycl::kernel::get_kernel_bundle performance. [4817b3f]
  • Changed USM pooling parameters for the Level Zero backend to boost performance. [57f8a44][b000db8]
  • Exposed value_type and min_capacity from SYCL pipes extension class. [e1619fa]
  • Improved thread-safety of the Level Zero plugin by guarding access to the PI objects. [3321141] [a37c10b] [bd80f34] [8f97fe2]
  • Optimized half builtins for fma, fmin, fmax and fmax on CUDA backend. [62651dd]
  • Improved runtime to redirect warning from using SYCL_DEVICE_FILTER with sycl-ls to std::cerr. [70593d6]
  • Use new SPIR-V group operations within uniform control flow instead of non-uniform operations in SYCL headers. [9b84dd8]
  • Enabled online linking of the device libraries. [9fcab29]
  • Improved esimd-verifier logic for detecting valid SYCL calls. [eaf8b42]
  • Extended XPTI information with the kernel info. [4b9eef3]
  • Improved error message for exceeding CUDA grid limits. [ed877c2]
  • Added overload for sycl::select(a, b, c) where c is a bool. [7ae8fd3]
  • Fixed batching related thresholds to improve performance. [c6313bd]
  • Added always_inline for libdevice functions to enable which allows an underlying runtime to do inlining. [dfc87cc]
  • Improved performance by caching the result of zeKernelGetName in the Level Zero plugin. [40cece3]
  • Updated the experimental latency control API to use property list and made the template argument approach is deprecated. [273034a]
  • Renewed and synced the queue::parallel_for() with SYCL2020 . [e59fb89]
  • Improved runtime to ignore CUDA prefetch hint if not supported on the system and emit an optional warning in this case depending on warning level set using SYCL_RT_WARNING_LEVEL. [082929a]
  • Enabled pooling of small USM allocations for the Level Zero backend to improve performance. [6244efe]
  • Added managed memory check to enqueue prefetch and made it to ignore the prefetch hint and emit a warning if the memory provided is not managed. [0fe322c]
  • Enabled device code instrumentation by default. [53fc8e4]
  • Improveed performance of queue::wait() on CUDA backend. [8b85a3c]
  • Optimized host event wait. [f3a0970]
  • Improved default selector to filter devices based on available device images. [0e67db8]
  • Enabled caching of native OpenCL and Level Zero executable binaries. [f0283fc]
  • Deprecated sycl::ext::intel::ctz extension function,sycl::ctz from the core specification must be used instead. [50435a6]
  • Deprecated cl::sycl::atomic in SYCL 2020 mode. [52fad5a]
  • Deprecated ext_intel_host_device_space which is replaced by ext_intel_global_host_space. [7a2f44b]
  • Added CUDA-specific USM memory advice values. [e2e71a9]

Tools

  • Added an option --enable-esimd-emulator to enable esimd emulator build using configure.py. [ddbcbb1]
  • Added an ability to build plugins separately. [c16412b]
  • Added --enable-all-llvm-targets switch to configure.py. [5e6642a]
  • Added PI tracing support for loadOsLibrary. [647249c]

Documentation

Bug fixes

SYCL Compiler

  • Fixed a crash happened if an overloaded new operator is used in a recursive function in the device code [2085978] [2085978]
  • Fixed wrong address space of event_t which could lead to builtins like barriers work incorrectly when using HIP backend. [22532c2]
  • Fixed an issue with certain macros being unavailable when using a custom host compiler. [652417b]
  • Fixed an issue with device code linking when one of the targets is not spir64 based. [1f8874f]
  • Disabled part of SimplifyCFG optimizations in SYCL mode which was resuling in invalid optimizations in some cases. [8b29220]
  • Silenced "unknown attribute" warning emitted during host part of full -fsycl compilation when it saw [[intel::device_indirectly_callable]] attribute. [718c0b1]
  • Removed incorrect assertion for use of -fopenmp-new-driver for multiple inputs. [6e0f6d1]
  • Fixed the issue caused by using the nvvm_reflect function in the nvptx backend with -O0. [537e51b]
  • Fixed a regression in cases where function pointers were captured as kernel arguments. [b19e2e4]
  • Fixed the libclc remangler to clone functions rather than aliasing to enable DPC++ for CUDA with -O0. [7b2fb02]
  • Fixed the error "Explicit load/store type does not match pointee type of pointer operand" caused by incorrect address space. [e688fa5]
  • Fixed incorrect diagnostic for __spirv calls when the reqd_sub_group_size attribute is applied on a sycl kernel. [1df7b59]
  • Fixed alignment of emulated specialization constants. [0cec3c6]
  • Fixed the group collective implementation for AMDGCN. Fixed the shuffleUp and shuffleDown functions for the AMDGCN builtins and SYCL headers. [d99e957]
  • Removed llvm.nvvm.suq.depth instruction which was causing CUDA_ERROR_NOT_FOUND or CUDA_ERROR_NOT_SUPPORTED errors if present in the fatbin. [ec29322]

SYCL Library

  • Fixed alignment of kernel local arguments in the CUDA backend. [ebb1281]
  • Fixed a crash which could happen when bulding program for multiple devices. [64c2d35]
  • Fixed max constant value query for the HIP backend. [1e55cf3]
  • Fixed ambiguity error with sycl::oneapi::experimental::this_nd_item. [8aad52d]
  • Fixed a performance issue caused by unnecessary command batching in the Level Zero plugin. [4d031a4]
  • Fixed an issue with sycl::get_pointer_device API working incorrectly for CUDA and HIP backends [8fa17b4]
  • Fixed an issue which might result in JITing for only one device while context is accosiated with multiple devices for Level Zero backend [7068457]
  • Fixed namespace ambiguity in this_id, this_item, and this_group. [19369b6]
  • Workarounded two bugs in the Level Zero driver related to static linking extension. [2930a94]
  • Fixed return type of get_nativesycl::backend::opencl(event) from cl_event to vector<cl_event>. [a2189c6]
  • Modified Level Zero plugin support for copy engines to address scenario when main copy engine is not available. [478a576]
  • Fixed support for query of USM capabilities. [5941394]
  • Fixed memory leak in the USM prefetch functionality. [5d4573f]
  • Fixed host device local accessor alignment. [08b14da]
  • Fixed sycl::errc values for exceptions per SYCL 2020. [270e78d]
  • Fixed bug with constexpr_recurse usage. [bd15de9]
  • Fixed max_work_group_size and reqd_work_group_size attribute arguments check. [7f37250]
  • Fixed iterator debug level mistmatch error on Widnows when building programs with /MDd when libsycl-fallback-cassert.obj is involved. [93b573a]
  • Fixed get_native() for sycl::event per requirements of the specification. [8878962]
  • Fixed device enumeration for the next platforms when current platform doesn't have devices. [0272ec2]
  • Fixed thread-safety issue in the scheduler which can appear if command gets cleaned up by another thread while adding a host accessor. [62ca43a]
  • Fixed reported device name for HIP backend. [68b089f]
  • Fixed SYCL_PROGRAM_COMPILE_OPTIONS and SYCL_PROGRAM_LINK_OPTIONS to override compile and link options respectively. [7a8fa1a]
  • Fixed incorrect handling of queue indexing for Level Zero backend. [1f4c9df]
  • Fixed memory leak in the reductions that require additional resources (such as buffers) [9aefea0]
  • Defined get_property/has_property in the queue for property::queue::in_order. [ca9fea6]
  • Fixed memory leak in the scheduler for run_on_host_intel commands. [86cf56a]
  • Fixed thread-safety issue caused by parallel access to the command list cache in the Level Zero plugin. [1f531c0]
  • Fixed device code outlining for static local variables to avoid invalid device code generation. [66e207e]
  • Fixed dynamic batching in the Level Zero plugin. [d6f115c]
  • Fixed unsigned long warning in fallback cstring on Windows. [5fbe02b]
  • Fixed sync of host task vs kernel for in-order queue. [c7ba937]
  • Fixed include dependency in fpga_lsu.hpp and pipes.hpp headers. [ac6a4f5]
  • Fixed kernel timestamp calculation in the Level Zero plugin. [c228f12] [7efb3e6] [76a3898]
  • Fixed usage of copy-engines in the Level Zero interoperability queue. [847f8b6]
  • Fixed kernel execution hang under large memory consumption by workarounding a bug in the Level Zero runtime. [b831bd0]
  • Fixed the Level Zero plugin to honor property::queue::enable_profiling. [fb27c65]
  • Fixed memory leak which existed when program build failed for the Level Zero backend. [beb7277]
  • Fixed buffer creation from rvalue iterator. [a905a27]
  • Fixed queue::device_has() to private. [279ef0d]
  • Fixed crash for case when a device image has no kernels. [279ef89]
  • Fixed dependency between host/device actions for unbundled FPGA specific archives. [9699575]
  • Fixed PI CUDA plugin to avoid linking against libsycl which may cause issues on some systems. [6821e66]
  • Fixed support of bfloat16 for CUDA. [5231fe4]
  • Fixed interop_handle::get_native_mem so that it can work with accessors that use non-empty accessor_property_list. [5452a5d]
  • Fixed sub-device count calculation for numa partitioning. [ae284f1]
  • Fixed SYCL_ENABLE_PLUGINS to enable both the OpenCL and the Level Zero PI plugins if it is unset. [ff384bb]
  • Fixed BDF format on PCI query for the Level Zero backend. [ad6253e]
  • Fixed sycl::queue XPTI instrumentation. [ec57cd7]
  • Fixed interoperability return type for sycl::buffer to std::vector<cl_mem> per SYCL 2020. [8b3c8c4]
  • Fixed SYCL_DUMP_IMAGES handling to also dump when spec constants are on. [211ccda]
  • Fixed failure in case of using zero-size local accessor on some backends. [1292532]
  • Fixed flaky bug which might appear in multi-threaded applications with simultaneous access to the cache of device lib programs. [92cfd53]
  • Fixed compfail issue with -ffast-math on CUDA backend. [90ac3ee]
  • Fixed make_queue interoperability API for the Level Zero to accept device argument to properly associate queue with the right device. [29a5369]
  • Fixed invalid handler issue by updating OpenCL ICD loader from community.
  • Fixed "undefined symbol" error for ldexpf, hypotf, frexpf on SYCL GPU device when using 3rd-party math headers instead of MSVC math headers on Windows.[476a351]
  • Fixed memory leak for interop events created from native handle. [3c1d342]
  • Fixed alignment of the memory returned from USM allocation functions. [3114f02]
  • Fixed Vendor ID for AMD devices. [a1b42aa]
  • Fixed sporadic failure of in-order queue due to non-closed batch on the Level Zero backend. [e8bff05]
  • Fixed possible deadlock in case of having dependent events from different queues in a multi-threaded application. [4c619e9]
  • Fixed issue with delivery of assert message before abort'ing [6a32706]
  • Fixed default value for the Alignment template parameter of the usm_allocator. [3a91cec]
  • Fixed API to get maximum width/height/depth of an image for the Level Zero backend. [9ecc74b]
  • Fixed runtime for CUDA backend to ignore usm mem advise and warn if device doesn't meet requirements. [ccaaa99]

Tools

  • Fixed sycl-post-link tool to properly handle the offset in specialization constant descriptors. [12d7c1f]
  • Fixed sycl-post-link tool to properly handle the padding at the end of composite types. [12d7c1f]
  • Fixed translation of Vector[Extract/Insert]Dynamic instructions in llvm-spirv [bf43d7b]
  • Fixed unconditional debug info generation for libsycl_profiler_collector.so [45784cd]
  • Fixed sycl-post-link failure caused by incorrect removal of llvm.used in the case when specialization constant has 2+ users. [80e9148]

Documentation

API/ABI breakages

  • Removed deprecated API from ESIMD headers. [ec0385d]
  • Renamed wi_slice to wi_data. [f364e18]
  • Renamed nbarrier_* API to named_barrier_* for ESIMD. [5023657]
  • Moved a part of ESIMD APIs outside of experimental namespace. [c557d78][b2ee289]
  • Moved bfloat16 from intel namespace to oneapi namespace. [5231fe4]

Known issues

  • Having MESA OpenCL implementation which provides no devices on a
    system may cause incorrect device discovery. As a workaround such an OpenCL
    implementation can be disabled by removing /etc/OpenCL/vendor/mesa.icd.
  • Compilation may fail on Windows in debug mode if a kernel uses
    std::array. This happens because debug version of std::array in
    Microsoft STL C++ headers calls functions that are illegal for the device
    code. As a workaround the following can be done:
    1. Dump compiler pipeline execution strings by passing -### option to the
      compiler. The compiler will print the internal execution strings of
      compilation tools. The actual compilation will not happen.
    2. Modify the (usually) first execution string (it should have
      -fsycl-is-device option) by adding
      -D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0 options to the
      end of the string. Execute all string one by one.
  • -fsycl-dead-args-optimization can't help eliminate offset of
    accessor even though it's created with no offset specified
  • SYCL 2020 barriers show worse performance than SYCL 1.2.1 do [18c80fa]
  • When using fallback assert in separate compilation flow it requires explicit
    linking against lib/libsycl-fallback-cassert.o or
    lib/libsycl-fallback-cassert.spv
  • Limit alignment of allocation requests at 64KB which is the only alignment
    supported by Level Zero[7dfaf3b]
  • On the following scenario on Level Zero backend:
    1. Kernel A, which uses buffer A, is submitted to queue A.
    2. Kernel B, which uses buffer B, is submitted to queue B.
    3. queueA.wait().
    4. queueB.wait().
      DPCPP runtime used to treat unmap/write commands for buffer A/B as host
      dependencies (i.e. they were waited for prior to enqueueing any command
      that's dependent on them). This allowed Level Zero plugin to detect that
      each queue is idle on steps 1/2 and submit the command list right away.
      This is no longer the case since we started passing these dependencies in an
      event waitlist and Level Zero plugin attempts to batch these commands, so
      the execution of kernel B starts only on step 4. The workaround restores the
      old behavior in this case until this is resolved [2023e10][6c137f8].
  • User-defined functions with the name and signature matching those of any
    OpenCL C built-in function (i.e. an exact match of arguments, return type
    doesn't matter) can lead to Undefined Behavior.
  • A DPC++ system that has FPGAs installed does not support multi-process
    execution. Creating a context opens the device associated with the context
    and places a lock on it for that process. No other process may use that
    device. Some queries about the device through device.get_info<>() also
    open up the device and lock it to that process since the runtime needs
    to query the actual device to obtain that information.
  • The format of the object files produced by the compiler can change between
    versions. The workaround is to rebuild the application.
  • Using sycl::program/sycl::kernel_bundle API to refer to a kernel defined
    in another translation unit leads to undefined behavior
  • Linkage errors with the following message:
    error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
    can happen when a SYCL application is built using MS Visual Studio 2019
    version below 16.3.0 and user specifies -std=c++14 or /std:c++14.
  • Printing internal defines isn't supported on Windows [50628db]