Skip to content

Commit e127a2e

Browse files
[SYCL][Doc] Update KernelProgramCache.md with details on in-memory eviction (#16129)
This PR updates KernelProgramCache.md by: 1. Making the doc in-sync with recent changes in in-memory cache (Like #11351) 2. Adding details of in-memory cache eviction (#16062) Partially fixes: #16017 --------- Co-authored-by: Marcos Maronas <marcos.maronas@intel.com>
1 parent 74cda4b commit e127a2e

File tree

1 file changed

+39
-39
lines changed

1 file changed

+39
-39
lines changed

sycl/doc/design/KernelProgramCache.md

Lines changed: 39 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Rationale behind caching
44

5-
During SYCL program execution SYCL runtime will create internal objects
5+
During SYCL program execution, the SYCL runtime will create internal objects
66
representing kernels and programs, it may also invoke JIT compiler to bring
77
kernels in a program to executable state. Those runtime operations are quite
88
expensive, and in some cases caching approach can be employed to eliminate
@@ -65,7 +65,7 @@ examples below illustrate scenarios where such optimization is possible.
6565
});
6666
```
6767

68-
In both cases SYCL runtime will need to build the program and kernels multiple
68+
In both cases, the SYCL runtime will need to build the program and kernels multiple
6969
times, which may involve JIT compilation and take quite a lot of time.
7070

7171
In order to eliminate this waste of run-time we introduce a kernel and program
@@ -97,42 +97,27 @@ The cache is split into two levels:
9797

9898
### In-memory cache
9999

100-
The cache stores underlying PI objects behind `sycl::program` and `sycl::kernel`
101-
user-level objects in a per-context data storage. The storage consists of two
102-
maps: one is for programs and the other is for kernels.
100+
The cache stores the underlying UR objects behind `sycl::program` and `sycl::kernel`
101+
user-level objects in a per-context data storage. The storage consists of three
102+
maps: one is for programs and the other two are for kernels.
103103

104104
The programs map's key consists of four components:
105105

106-
- kernel set id<sup>[1](#what-is-ksid)</sup>,
106+
- ID of the device image containing the program,
107107
- specialization constants values,
108-
- the device this program is built for,
109-
- build options id <sup>[2](#what-is-bopts)</sup>.
108+
- the set of devices this program is built for.
110109

111110
The kernels map's key consists of two components:
112111

113112
- the program the kernel belongs to,
114113
- kernel name<sup>[3](#what-is-kname)</sup>.
115114

116-
(what-is-ksid)=
117-
<a name="what-is-ksid">1</a>: Kernel set id is an ordinal number of the device
118-
binary image the kernel is contained in.
115+
The third map, called Fast Kernel Cache, is used as an optimization to reduce the
116+
number of lookups in the kernels map. Its key consists of the following components:
119117

120-
(what-is-bopts)=
121-
<a name="what-is-bopts">2</a>: The concatenation of build options (both compile
122-
and link options) set in application or environment variables. There are three
123-
sources of build options that the cache is aware of:
124-
125-
- from device image (pi_device_binary_struct::CompileOptions,
126-
pi_device_binary_struct::LinkOptions);
127-
- environment variables (SYCL_PROGRAM_COMPILE_OPTIONS,
128-
SYCL_PROGRAM_LINK_OPTIONS);
129-
- options passed through SYCL API.
130-
131-
Note: Backend runtimes used by SYCL can have extra environment or configurations
132-
values (e.g. IGC has
133-
[igc_flags.def](https://github.com/intel/intel-graphics-compiler/blob/7f91dd6b9f2ca9c1a8ffddd04fa86461311c4271/IGC/common/igc_flags.def)
134-
which affect JIT process). Changing such configuration will invalidate cache and
135-
manual cache cleanup should be done.
118+
- specialization constants values,
119+
- the UR handle of the device this kernel is built for,
120+
- kernel name<sup>[3](#what-is-kname)</sup>.
136121

137122
(what-is-kname)=
138123
<a name="what-is-kname">3</a>: Kernel name is a kernel ID mangled class' name
@@ -141,7 +126,7 @@ which is provided to methods of `sycl::handler` (e.g. `parallel_for` or
141126

142127
### Persistent cache
143128

144-
The cache works behind in-memory cache and stores the same underlying PI
129+
The cache works behind in-memory cache and stores the same underlying UR
145130
object behind `sycl::program` user-level objects in a per-context data storage.
146131
The storage is organized as a map for storing device code image. It uses
147132
different keys to address difference in SYCL objects ids between applications
@@ -177,20 +162,33 @@ values for `info::platform::name`, `info::device::name`,
177162
differentiate different HW and SW installed on the same host as well as SW/HW
178163
upgrades.
179164

165+
(what-is-bopts)=
180166
<a name="what-is-bopts">3</a>: Hash for the concatenation of build options (both
181167
compile and link options) set in application or environment variables. There are
182168
three sources of build options:
183169

184-
- from device image (pi_device_binary_struct::CompileOptions,
185-
pi_device_binary_struct::LinkOptions);
170+
- from device image (sycl_device_binary_struct::CompileOptions,
171+
sycl_device_binary_struct::LinkOptions);
186172
- environment variables (SYCL_PROGRAM_COMPILE_OPTIONS,
187173
SYCL_PROGRAM_LINK_OPTIONS);
188174
- options passed through SYCL API.
189175

190176
## Cache configuration
191177

192-
The environment variables which affect cache behavior are described in
193-
[EnvironmentVariables.md](../EnvironmentVariables.md).
178+
The following environment variables affect the cache behavior:
179+
180+
| Environment variable | Values | Description |
181+
| -------------------- | ------ | ----------- |
182+
| `SYCL_CACHE_DIR` | Path | Path to persistent cache root directory. Default values are `%AppData%\libsycl_cache` for Windows and `$XDG_CACHE_HOME/libsycl_cache` on Linux, if `XDG_CACHE_HOME` is not set then `$HOME/.cache/libsycl_cache`. When none of the environment variables are set SYCL persistent cache is disabled. |
183+
| `SYCL_CACHE_PERSISTENT` | '1' or '0' | Controls persistent device compiled code cache. Turns it on if set to '1' and turns it off if set to '0'. When cache is enabled SYCL runtime will try to cache and reuse JIT-compiled binaries. Default is off. |
184+
| `SYCL_CACHE_IN_MEM` | '1' or '0' | Enable ('1') or disable ('0') in-memory caching of device compiled code. When cache is enabled SYCL runtime will try to cache and reuse JIT-compiled binaries. Default is '1'. |
185+
| `SYCL_IN_MEM_CACHE_EVICTION_THRESHOLD` | Positive integer | `SYCL_IN_MEM_CACHE_EVICTION_THRESHOLD` accepts an integer that specifies the maximum size of the in-memory program cache in bytes. Eviction is performed when the cache size exceeds the threshold. The default value is 0 which means that eviction is disabled. |
186+
| `SYCL_CACHE_EVICTION_DISABLE` | Any(\*) | Switches persistent cache eviction off when the variable is set. |
187+
| `SYCL_CACHE_MAX_SIZE` | Positive integer | Persistent cache eviction is triggered once total size of cached images exceeds the value in megabytes (default - 8 192 for 8 GB). Set to 0 to disable size-based cache eviction. |
188+
| `SYCL_CACHE_THRESHOLD` | Positive integer | Persistent cache eviction threshold in days (default value is 7 for 1 week). Set to 0 for disabling time-based cache eviction. |
189+
| `SYCL_CACHE_MIN_DEVICE_IMAGE_SIZE` | Positive integer | Minimum size of device code image in bytes which is reasonable to cache on disk because disk access operation may take more time than do JIT compilation for it. Applicable only for persistent cache. Default value is 0 to cache all images. |
190+
| `SYCL_CACHE_MAX_DEVICE_IMAGE_SIZE` | Positive integer | Maximum size of device image in bytes which is cached. Caching big kernels may overload the disk very fast. Applicable only for persistent cache. Default value is 1 GB. |
191+
194192

195193
## Implementation details
196194

@@ -248,7 +246,7 @@ queue). Possibility of enqueueing multiple cacheable kernels simultaneously
248246
from multiple threads requires us to provide thread-safety for the caching
249247
mechanisms.
250248

251-
It is worth of noting that we don't cache the PI resource (kernel or program)
249+
It is worth of noting that we don't cache the UR resource (kernel or program)
252250
by itself. Instead we augment the resource with the status of build process.
253251
Hence, what is cached is a wrapper structure `BuildResult` which contains three
254252
information fields - pointer to built resource, build error (if applicable) and
@@ -296,7 +294,7 @@ class implements RAII to make code look cleaner a bit. Now, GetCache function
296294
will return the mapping to be employed that includes the 3 components: kernel
297295
name, device as well as any specialization constants values. These get added to
298296
`BuildResult` and are cached. The `BuildResult` structure is specialized with
299-
either `PiKernel` or `PiProgram`<sup>[1](#remove-pointer)</sup>.
297+
either `ur_kernel_handle_t` or `ur_program_handle_t`<sup>[1](#remove-pointer)</sup>.
300298

301299
### Hash function
302300

@@ -351,7 +349,7 @@ The device code image are stored on file system using structure below:
351349

352350
- `<cache_root>` - root directory storing cache files, that depends on
353351
environment variables (see SYCL_CACHE_DIR description in the
354-
[EnvironmentVariables.md](../EnvironmentVariables.md));
352+
[Cache configuration](#cache-configuration));
355353
- `<device_hash>` - hash out of device information used to identify target
356354
device;
357355
- `<device_image_hash>` - hash made out of device image used as input for the
@@ -408,10 +406,12 @@ LRU (least recently used) strategy both for in-memory and persistent cache.
408406

409407
#### In-memory cache eviction
410408

411-
It is initiated on program/kernel maps access/add item operation. When cache
412-
size exceeds storage threshold the items which are least recently used are
413-
deleted.
414-
TODO: add detailed description of in-memory cache eviction mechanism.
409+
Eviction in in-memory cache is disabled by default but can be controlled by SYCL_IN_MEM_CACHE_EVICTION_THRESHOLD
410+
environment variable. The threshold is set in bytes and when the cache size exceeds the threshold the eviction process is initiated. The eviction process is based on LRU strategy. The cache is walked through and the least recently used items are deleted until the cache size is below the threshold.
411+
To implement eviction for in-memory cache efficiently, we store the programs in a linked-list, called the eviction list. When the program is first added to the cache, it is also added to the back of the eviction list. When a program is fetched from cache, we move the program to the end of the eviction list. This way, we ensure that the programs at the beginning of the eviction list are always the least recently used.
412+
When adding a new program to cache, we check if the size of the program cache exceeds the threshold, if so, we iterate through the eviction list starting from the front and delete the programs until the cache size is below the threshold. When a program is deleted from the cache, we also evict its corresponding kernels from both of the kernel caches.
413+
414+
***If the application runs out-of-memory,*** either due to cache eviction being disabled or the cache eviction threshold being too high, we will evict all the items from program and kernel caches.
415415

416416
#### Persistent cache eviction
417417

0 commit comments

Comments
 (0)