You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
New features
SYCL Compiler
Added -fcuda-prec-sqrt frontend compiler option which enables higher presision version of sqrt in the device code [ebf9ea8]
Added support for local memory accessors for the HIP backend. [58508ba]
Added initial support of -lname processing when searching for fat static libraries. [35e32d8] [a33f9c8]
Added -fsycl-fp32-prec-sqrt flag which enables correctly rounded sycl::sqrt. [5c8b7e7]
Added support for [[intel::loop_count()]] attribute. [c536e76]
Added support for passing driver options to JIT compiler and linker. [1c93bfe]
Added default argument support for work_group_size_hint attribute. [0cff80e]
Added support for float and double exchange and compare exchange atomic operations in CUDA libclc. [1d84c99]
Added --ffast-math support for CUDA libclc. [0f0c5d1]
Added support for software atomics (except for the ones using system scope) for lower sm versions of CUDA architecture. Enabled SYCL_USE_NATIVE_FP_ATOMICS by default. [7bc8447]
Added support for the global offset for AMDGPU. [2dc3c06]
Added support for asynchronous barrier for CUDA backend sm 80+. [6770421]
Added -f[no-]sycl-device-lib-jit-link option to control JIT linking of SYCL device libraries. [dfb37a8] [c946286]
Added support for the new FPGA attribute [[intel::fpga_pipeline(N)]] for loop pipelining. [92aadf3]
Added support for Nvidia MMA for bf16, mixed precision int ((u)int8/int32), and mixed precision float (half/float). [5373362]
Added a mode for the Level Zero plugin where only last command in each batch yields a host-visible event. Enabled this mode by default. [c6b7b8e]
Added an option to query for atomic scope capabilities for the CUDA backend. Updated returns for atomics memory order capabilties. [43a4192]
Added support for an experimental Level Zero API for host pointer import into USM. The feature can be enabled using SYCL_USM_HOSTPTR_IMPORT environment variable. [844d7b6]
Added support for the wi_element for bf16 type. [9f2b7bd]
Added complex support for the reduce and scan group algorithms. [90a4dc7]
Added support for SYCL 2020 methods in the group class. [73d59ce]
Added SYCL_RT_WARNING_LEVEL environment variable which allows to control amount of warnings and performance hints the runtime library may print. [2741010]
Added tanh (for floats/halfs) and exp2 (for halfs) native definitions for CUDA backend. [250c498]
Added bf16 builtins for fma, fmin, fmax and fmax on CUDA backend. [62651dd]
Added support for USM buffer location properties which allows to specify at what memory location the device usm allocation should be in. [12c988a]
Added support for buffer_location property to the sycl::buffer. [9808525]
Added single_task support for ESIMD_EMULATOR backend. [2331160]
Added support for SVM 1,2,4-elements gather/scatter for ESIMD. [e200720]
Added support for bf16 builtins operating on storage types for CUDA backend. [413a9ef]
Added support for backend_version device property for CUDA backend. [4b1a4bc]
Added support for round-robin submissions to multiple compute CCS for the Level Zero backend. Disabled by default, can be controlled using SYCL_PI_LEVEL_ZERO_USE_COMPUTE_ENGINE. [a836c87]
Added support for buffer migration for contexts with multiple devices in the Level Zero plugin. [7baf152]
Added mode where the Level Zero plugin uses immediate command-lists instead of standard command-lists. This mode is disabled by default, can be enabled using SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS environment variable. [b9cb1d1]
Added support for sycl::get_native(sycl::buffer) for OpenCL and CUDA backends. [8b3c8c4]
Introduced new environment variable SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ to control the max local memory allowed to be allocated per kernel on CUDA backend. [2e24304]
Implemented property set generation for device globals in the sycl-post-link. Added the --device-globals command-line argument for lowering and generating information about device global variables. [88123c1]
Introduced XPTI-based tools for SYCL applications: sycl-trace, sycl-prof, sycl-sanitize. [789d138]
Add support for tracing Level Zero API calls using XPTI and updated sycl-trace tool to be able to display both PI and Level Zero calls simultaneously. [fc9cf52]
Improvements
SYCL Compiler
Added a diagnostic on attempt to use zero length arrays in the device code [52e8f58]
Added support for consuming fat objects containing SPIR-V [c878063]
Added support for generating SPRIV based fat objects [1e94ef3]
Added support for group collective functions for HIP backend [106882c]
Added a diagnostic on attempt to use -fsycl and -static-libstdc++ together. This combination is not supported due to the runtime dependence with libsycl.so [bb0055c]
Added support for atomic loads and stores with various memory orders and scopes [e15ac50] [6b2635e]
Improved performance of accessing memory pointed by sycl::accessor for FPGA device [fbab374]
Added support for setting CUDA installation directory using CUDA_PATH environment variable [b0c145a]
Improved deferred diagnostics for usages within function templates in device code. [d0efca5]
Added support for sycl_special_class attribute to mark SYCL classes/struct that need the additional compiler handling. [8ba9c79]
Improved driver to do device section checking only when offloading is enabled. [3742b93]
Allowed calls to constant expression function pointers in device code. [e84c952]
Disabled the passing code coverage and profiling options to device compilation.
Added clang support of code location information for kernels. [96d2e17]
Disallowed explicit casts between mismatching address spaces. [1cee960]
Added support of [[sycl::device_has]] attribute on kernel. [aa2162c]
Added a warning on explicit cast from default address space to named. [9adb25b]
Added a warning for converting 'c' input to 'c++' in SYCL mode. [5b62ee0]
Silenced unknown attribute warnings on host compilation. [2d359df]
Added a diagnostic on attemp to use accessor::operator[] in ESIMD code. [9d7a651]
Expanded driver's ability to discover fat static archives after /link option on Windows. [271ef40]
Added support for saving user specified names for lambda captures in kernel lambda object for FPGA target. [af29982] [5ffb2ee]
Adjusted the compilation when preprocessing is requested to allow for the device compilation to fail and continue to perform the preprocessing steps. [7f2e99c]
Added the ability to detect a kernel size mismatch in the case when the host and device compilers are different. [ef90e6a]
Improved handling of specialization constants by backends. [e62b5aa]
Improved support of -mlong-double options. [6083920]
Improved -save-temps to allow optimization when performing a SYCL device compilation. [05fe5ae]
Removed warning diagnostic on host compilation when using __attribute__((sycl_device)). [49e595e]
Introduceed multiple streams in each queue for CUDA backend to improve concurrent execution. [dd41845]
Improved compiler to collect information for optimization record only if optmization record is saved by user (i.e. -fsave-optimization-record or -opt-record-file is passed). [cb94c80]
Added the new kernel_arg_exclusive_ptr metadata which guarantees that the kernel pointer argument, or pointers that derive from it, will not be dereferenced outside current invocation of the kernel. [e03c4ed]
Added a warning for the case when invalid subgroup size is used on kernel for CUDA backend. [6dab69f]
Improved deprecation messaging for options. [f0b65a1]
Improved diagnostic behavior for -fsanitize with -fsycl. [9397cbc]
Changed USM pooling parameters for the Level Zero backend to boost performance. [57f8a44][b000db8]
Exposed value_type and min_capacity from SYCL pipes extension class. [e1619fa]
Improved thread-safety of the Level Zero plugin by guarding access to the PI objects. [3321141] [a37c10b] [bd80f34] [8f97fe2]
Optimized half builtins for fma, fmin, fmax and fmax on CUDA backend. [62651dd]
Improved runtime to redirect warning from using SYCL_DEVICE_FILTER with sycl-ls to std::cerr. [70593d6]
Use new SPIR-V group operations within uniform control flow instead of non-uniform operations in SYCL headers. [9b84dd8]
Enabled online linking of the device libraries. [9fcab29]
Improved esimd-verifier logic for detecting valid SYCL calls. [eaf8b42]
Extended XPTI information with the kernel info. [4b9eef3]
Improved error message for exceeding CUDA grid limits. [ed877c2]
Added overload for sycl::select(a, b, c) where c is a bool. [7ae8fd3]
Fixed batching related thresholds to improve performance. [c6313bd]
Added always_inline for libdevice functions to enable which allows an underlying runtime to do inlining. [dfc87cc]
Improved performance by caching the result of zeKernelGetName in the Level Zero plugin. [40cece3]
Updated the experimental latency control API to use property list and made the template argument approach is deprecated. [273034a]
Renewed and synced the queue::parallel_for() with SYCL2020 . [e59fb89]
Improved runtime to ignore CUDA prefetch hint if not supported on the system and emit an optional warning in this case depending on warning level set using SYCL_RT_WARNING_LEVEL. [082929a]
Enabled pooling of small USM allocations for the Level Zero backend to improve performance. [6244efe]
Added managed memory check to enqueue prefetch and made it to ignore the prefetch hint and emit a warning if the memory provided is not managed. [0fe322c]
Enabled device code instrumentation by default. [53fc8e4]
Improveed performance of queue::wait() on CUDA backend. [8b85a3c]
Removed extensions specifications which were adopted to SYCL 2020. Please refer to extensions/removed/README for the list of removed extensions. [ae7e3d8] [57c9017]
Fixed a crash happened if an overloaded new operator is used in a recursive function in the device code [2085978] [2085978]
Fixed wrong address space of event_t which could lead to builtins like barriers work incorrectly when using HIP backend. [22532c2]
Fixed an issue with certain macros being unavailable when using a custom host compiler. [652417b]
Fixed an issue with device code linking when one of the targets is not spir64 based. [1f8874f]
Disabled part of SimplifyCFG optimizations in SYCL mode which was resuling in invalid optimizations in some cases. [8b29220]
Silenced "unknown attribute" warning emitted during host part of full -fsycl compilation when it saw [[intel::device_indirectly_callable]] attribute. [718c0b1]
Removed incorrect assertion for use of -fopenmp-new-driver for multiple inputs. [6e0f6d1]
Fixed the issue caused by using the nvvm_reflect function in the nvptx backend with -O0. [537e51b]
Fixed a regression in cases where function pointers were captured as kernel arguments. [b19e2e4]
Fixed the libclc remangler to clone functions rather than aliasing to enable DPC++ for CUDA with -O0. [7b2fb02]
Fixed the error "Explicit load/store type does not match pointee type of pointer operand" caused by incorrect address space. [e688fa5]
Fixed incorrect diagnostic for __spirv calls when the reqd_sub_group_size attribute is applied on a sycl kernel. [1df7b59]
Fixed alignment of emulated specialization constants. [0cec3c6]
Fixed the group collective implementation for AMDGCN. Fixed the shuffleUp and shuffleDown functions for the AMDGCN builtins and SYCL headers. [d99e957]
Removed llvm.nvvm.suq.depth instruction which was causing CUDA_ERROR_NOT_FOUND or CUDA_ERROR_NOT_SUPPORTED errors if present in the fatbin. [ec29322]
SYCL Library
Fixed alignment of kernel local arguments in the CUDA backend. [ebb1281]
Fixed a crash which could happen when bulding program for multiple devices. [64c2d35]
Fixed max constant value query for the HIP backend. [1e55cf3]
Fixed ambiguity error with sycl::oneapi::experimental::this_nd_item. [8aad52d]
Fixed a performance issue caused by unnecessary command batching in the Level Zero plugin. [4d031a4]
Fixed an issue with sycl::get_pointer_device API working incorrectly for CUDA and HIP backends [8fa17b4]
Fixed an issue which might result in JITing for only one device while context is accosiated with multiple devices for Level Zero backend [7068457]
Fixed namespace ambiguity in this_id, this_item, and this_group. [19369b6]
Workarounded two bugs in the Level Zero driver related to static linking extension. [2930a94]
Fixed return type of get_nativesycl::backend::opencl(event) from cl_event to vector<cl_event>. [a2189c6]
Modified Level Zero plugin support for copy engines to address scenario when main copy engine is not available. [478a576]
Fixed support for query of USM capabilities. [5941394]
Fixed memory leak in the USM prefetch functionality. [5d4573f]
Fixed host device local accessor alignment. [08b14da]
Fixed sycl::errc values for exceptions per SYCL 2020. [270e78d]
Fixed interoperability return type for sycl::buffer to std::vector<cl_mem> per SYCL 2020. [8b3c8c4]
Fixed SYCL_DUMP_IMAGES handling to also dump when spec constants are on. [211ccda]
Fixed failure in case of using zero-size local accessor on some backends. [1292532]
Fixed flaky bug which might appear in multi-threaded applications with simultaneous access to the cache of device lib programs. [92cfd53]
Fixed compfail issue with -ffast-math on CUDA backend. [90ac3ee]
Fixed make_queue interoperability API for the Level Zero to accept device argument to properly associate queue with the right device. [29a5369]
Fixed invalid handler issue by updating OpenCL ICD loader from community.
Fixed "undefined symbol" error for ldexpf, hypotf, frexpf on SYCL GPU device when using 3rd-party math headers instead of MSVC math headers on Windows.[476a351]
Fixed memory leak for interop events created from native handle. [3c1d342]
Fixed alignment of the memory returned from USM allocation functions. [3114f02]
Moved properties and property-related APIs into sycl::ext::oneapi::experimental. sycl_ext_oneapi_properties specification was updated to revision 2. [33fdc58][aacf541]
Renamed nbarrier_* API to named_barrier_* for ESIMD. [5023657]
Moved a part of ESIMD APIs outside of experimental namespace. [c557d78][b2ee289]
Moved bfloat16 from intel namespace to oneapi namespace. [5231fe4]
Known issues
Having MESA OpenCL implementation which provides no devices on a
system may cause incorrect device discovery. As a workaround such an OpenCL
implementation can be disabled by removing /etc/OpenCL/vendor/mesa.icd.
Compilation may fail on Windows in debug mode if a kernel uses std::array. This happens because debug version of std::array in
Microsoft STL C++ headers calls functions that are illegal for the device
code. As a workaround the following can be done:
Dump compiler pipeline execution strings by passing -### option to the
compiler. The compiler will print the internal execution strings of
compilation tools. The actual compilation will not happen.
Modify the (usually) first execution string (it should have -fsycl-is-device option) by adding -D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0 options to the
end of the string. Execute all string one by one.
-fsycl-dead-args-optimization can't help eliminate offset of
accessor even though it's created with no offset specified
SYCL 2020 barriers show worse performance than SYCL 1.2.1 do [18c80fa]
When using fallback assert in separate compilation flow it requires explicit
linking against lib/libsycl-fallback-cassert.o or lib/libsycl-fallback-cassert.spv
Limit alignment of allocation requests at 64KB which is the only alignment
supported by Level Zero[7dfaf3b]
On the following scenario on Level Zero backend:
Kernel A, which uses buffer A, is submitted to queue A.
Kernel B, which uses buffer B, is submitted to queue B.
queueA.wait().
queueB.wait().
DPCPP runtime used to treat unmap/write commands for buffer A/B as host
dependencies (i.e. they were waited for prior to enqueueing any command
that's dependent on them). This allowed Level Zero plugin to detect that
each queue is idle on steps 1/2 and submit the command list right away.
This is no longer the case since we started passing these dependencies in an
event waitlist and Level Zero plugin attempts to batch these commands, so
the execution of kernel B starts only on step 4. The workaround restores the
old behavior in this case until this is resolved [2023e10][6c137f8].
User-defined functions with the name and signature matching those of any
OpenCL C built-in function (i.e. an exact match of arguments, return type
doesn't matter) can lead to Undefined Behavior.
A DPC++ system that has FPGAs installed does not support multi-process
execution. Creating a context opens the device associated with the context
and places a lock on it for that process. No other process may use that
device. Some queries about the device through device.get_info<>() also
open up the device and lock it to that process since the runtime needs
to query the actual device to obtain that information.
The format of the object files produced by the compiler can change between
versions. The workaround is to rebuild the application.
Using sycl::program/sycl::kernel_bundle API to refer to a kernel defined
in another translation unit leads to undefined behavior
Linkage errors with the following message: error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
can happen when a SYCL application is built using MS Visual Studio 2019
version below 16.3.0 and user specifies -std=c++14 or /std:c++14.
Printing internal defines isn't supported on Windows [50628db]