Skip to content

Commit fcc330e

Browse files
authored
[SYCL] Allow alignment property to be used for group load/store (#16882)
It makes possible to provide alignment<value> property to the load/store operations indicating the known alignment of the pointer. It will allow to avoid expensive dynamic alignment checks.
1 parent 8b83ba8 commit fcc330e

File tree

7 files changed

+1622
-1378
lines changed

7 files changed

+1622
-1378
lines changed

sycl/doc/extensions/experimental/sycl_ext_oneapi_group_load_store.asciidoc

Lines changed: 47 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -101,11 +101,13 @@ in the group.
101101
and default constructible.
102102
* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
103103

104+
_Mandates_: If `Properties` contains the `alignment` property, `InputIteratorT` must be a pointer.
105+
104106
_Effects_: Loads single element from `in_iter` to `out` by using the `g` group
105107
object to identify memory location as `in_iter` + `g.get_local_linear_id()`.
106108

107-
Properties may provide xref:optimization_properties[assertions] which can
108-
enable better optimizations.
109+
Properties may provide xref:optimization_properties[assertions] or the `alignment` property
110+
which can enable better optimizations.
109111

110112
==== `sycl::vec` Overload
111113

@@ -132,6 +134,8 @@ in the group.
132134
and default constructible.
133135
* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
134136

137+
_Mandates_: If `Properties` contains the `alignment` property, `InputIteratorT` must be a pointer.
138+
135139
_Effects_: Loads `N` elements from `in_iter` to `out`
136140
using the `g` group object.
137141
Properties may specify xref:data_placement[data placement].
@@ -140,8 +144,9 @@ Default data placement is a blocked one:
140144
in striped case:
141145
`out[i]` = `in_iter[g.get_local_linear_id() + g.get_local_linear_range() * i];`
142146
for `i` between `0` and `N`.
143-
Properties may also provide xref:optimization_properties[assertions] which can
144-
enable better optimizations.
147+
Properties may also provide xref:optimization_properties[assertions] or the `alignment` property
148+
which can enable better optimizations.
149+
145150

146151
==== Fixed-size Array Overload
147152

@@ -169,6 +174,8 @@ work-group or sub-group.
169174
and default constructible.
170175
* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
171176

177+
_Mandates_: If `Properties` contains the `alignment` property, `InputIteratorT` must be a pointer.
178+
172179
_Effects_: Loads `ElementsPerWorkItem` elements from `in_iter` to `out`
173180
using the `g` group object.
174181
Properties may specify xref:data_placement[data placement].
@@ -177,8 +184,9 @@ Default placement is a blocked one:
177184
in striped case:
178185
`out[i]` = `in_iter[g.get_local_linear_id() + g.get_local_linear_range() * i];`
179186
for `i` between `0` and `ElementsPerWorkItem`.
180-
Properties may also provide xref:optimization_properties[assertions] which can
181-
enable better optimizations.
187+
Properties may also provide xref:optimization_properties[assertions] or the `alignment` property
188+
which can enable better optimizations.
189+
182190

183191

184192
=== Store API
@@ -209,11 +217,13 @@ in the group.
209217
and default constructible.
210218
* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
211219

220+
_Mandates_: If `Properties` contains the `alignment` property, `OutputIteratorT` must be a pointer.
221+
212222
_Effects_: Stores single element `in` to `out_iter` by using the `g` group
213223
object to identify memory location as `out_iter` + `g.get_local_linear_id()`
214224

215-
Properties may provide xref:optimization_properties[assertions] which can
216-
enable better optimizations.
225+
Properties may provide xref:optimization_properties[assertions] or the `alignment` property
226+
which can enable better optimizations.
217227

218228

219229
==== `sycl::vec` Overload
@@ -241,6 +251,8 @@ in the group.
241251
and default constructible.
242252
* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
243253

254+
_Mandates_: If `Properties` contains the `alignment` property, `OutputIteratorT` must be a pointer.
255+
244256
_Effects_: Stores `N` elements from `in` vec to `out_iter`
245257
using the `g` group object.
246258
Properties may specify xref:data_placement[data placement].
@@ -249,8 +261,8 @@ Default placement is a blocked one:
249261
in striped case:
250262
`out_iter[g.get_local_linear_id() + g.get_local_linear_range() * i]` = `in[i];`
251263
for `i` between `0` and `N`.
252-
Properties may also provide xref:optimization_properties[assertions] which can
253-
enable better optimizations.
264+
Properties may also provide xref:optimization_properties[assertions] or the `alignment` property
265+
which can enable better optimizations.
254266

255267

256268
==== Fixed-size Array Overload
@@ -280,6 +292,8 @@ work-group or sub-group.
280292
and default constructible.
281293
* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
282294

295+
_Mandates_: If `Properties` contains the `alignment` property, `OutputIteratorT` must be a pointer.
296+
283297
_Effects_: Stores `ElementsPerWorkItem` elements from `in` span to `out_iter`
284298
using the `g` group object.
285299

@@ -289,8 +303,9 @@ Default placement is a blocked one:
289303
in striped case:
290304
`out_iter[g.get_local_linear_id() + g.get_local_linear_range() * i]` = `in[i];`
291305
for `i` between `0` and `ItemsPerWorkItem`.
292-
Properties may also provide xref:optimization_properties[assertions] which can
293-
enable better optimizations.
306+
Properties may also provide xref:optimization_properties[assertions] or the `alignment` property
307+
which can enable better optimizations.
308+
294309

295310
=== Data Placement
296311

@@ -442,6 +457,23 @@ so the implementation can rely on `get_max_local_range()` range size:
442457

443458
If partition is uneven the behavior is undefined.
444459

460+
== Alignment
461+
462+
If `InputIteratorT`/`OutputIteratorT` is a pointer then the following property can be used
463+
to provide an alignment of the pointer. It can allow to avoid dynamic alignment check.
464+
465+
```c++
466+
namespace sycl::ext::oneapi::experimental {
467+
struct alignment_key {
468+
template <int K>
469+
using value_t = property_value<alignment_key, std::integral_constant<int, K>>;
470+
};
471+
472+
template<int K>
473+
inline constexpr alignment_key::value_t<K> alignment;
474+
} // namespace sycl::ext::oneapi::experimental
475+
```
476+
445477
== Usage Example
446478

447479
Example shows the simplest case without local memory usage of blocked load
@@ -458,8 +490,8 @@ constexpr std::size_t block_count = 2;
458490
constexpr std::size_t size = block_count * block_size * items_per_thread;
459491
460492
sycl::queue q;
461-
T* input = sycl::malloc_device<T>(size, q);
462-
T* output = sycl::malloc_device<T>(size, q);
493+
T* input = sycl::aligned_alloc_device<T>(16, size, q);
494+
T* output = sycl::aligned_alloc_device<T>(16, size, q);
463495
464496
q.submit([&](sycl::handler& cgh) {
465497
cgh.parallel_for(
@@ -472,7 +504,7 @@ q.submit([&](sycl::handler& cgh) {
472504
auto offset = g.get_group_id(0) * g.get_local_range(0) *
473505
items_per_thread;
474506
475-
auto props = sycl_exp::properties{sycl_exp::contiguous_memory};
507+
auto props = sycl_exp::properties{sycl_exp::contiguous_memory, sycl_exp::alignment<16>};
476508
477509
sycl_exp::group_load(g, input + offset, sycl::span{ data }, props);
478510

sycl/include/sycl/ext/oneapi/experimental/group_load_store.hpp

Lines changed: 79 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
#pragma once
1212

13+
#include <sycl/ext/oneapi/experimental/annotated_ptr/annotated_ptr.hpp>
1314
#include <sycl/ext/oneapi/properties/properties.hpp>
1415
#include <sycl/group_barrier.hpp>
1516
#include <sycl/sycl_span.hpp>
@@ -255,25 +256,29 @@ constexpr auto get_block_op_ptr(IteratorT iter,
255256
}
256257
}
257258

258-
template <int RequiredAlign, typename IteratorType>
259-
bool is_aligned(IteratorType iter) {
259+
template <int RequiredAlign, typename IteratorType, typename Properties>
260+
bool is_aligned(IteratorType iter, [[maybe_unused]] Properties props) {
260261
using value_type = remove_decoration_t<
261262
typename std::iterator_traits<IteratorType>::value_type>;
263+
264+
if constexpr (Properties::template has_property<alignment_key>()) {
265+
if (Properties::template get_property<alignment_key>().value >=
266+
RequiredAlign)
267+
return true;
268+
}
269+
262270
return alignof(value_type) >= RequiredAlign ||
263271
reinterpret_cast<uintptr_t>(&*iter) % RequiredAlign == 0;
264272
}
265273

266-
} // namespace detail
267-
268-
// Load API span overload.
269274
template <typename Group, typename InputIteratorT, typename OutputT,
270275
std::size_t ElementsPerWorkItem,
271276
typename Properties = decltype(properties())>
272277
std::enable_if_t<detail::verify_load_types<InputIteratorT, OutputT> &&
273278
detail::is_generic_group_v<Group> &&
274279
is_property_list_v<Properties>>
275-
group_load(Group g, InputIteratorT in_ptr,
276-
span<OutputT, ElementsPerWorkItem> out, Properties props = {}) {
280+
group_load_impl(Group g, InputIteratorT in_ptr,
281+
span<OutputT, ElementsPerWorkItem> out, Properties props = {}) {
277282
constexpr bool blocked = detail::isBlocked(props);
278283
using use_naive =
279284
detail::merged_properties_t<Properties,
@@ -286,7 +291,7 @@ group_load(Group g, InputIteratorT in_ptr,
286291
group_barrier(g);
287292
return;
288293
} else if constexpr (!std::is_same_v<Group, sycl::sub_group>) {
289-
return group_load(g, in_ptr, out, use_naive{});
294+
return group_load_impl(g, in_ptr, out, use_naive{});
290295
} else {
291296
auto ptr = detail::get_block_op_ptr<ElementsPerWorkItem>(in_ptr, props);
292297
static constexpr auto deduced_address_space =
@@ -297,12 +302,12 @@ group_load(Group g, InputIteratorT in_ptr,
297302
access::address_space::generic_space) {
298303
if (auto local_ptr = detail::dynamic_address_cast<
299304
access::address_space::local_space>(ptr)) {
300-
return group_load(g, local_ptr, out, props);
305+
return group_load_impl(g, local_ptr, out, props);
301306
} else if (auto global_ptr = detail::dynamic_address_cast<
302307
access::address_space::global_space>(ptr)) {
303-
return group_load(g, global_ptr, out, props);
308+
return group_load_impl(g, global_ptr, out, props);
304309
} else {
305-
return group_load(g, in_ptr, out, use_naive{});
310+
return group_load_impl(g, in_ptr, out, use_naive{});
306311
}
307312
} else {
308313
using value_type = remove_decoration_t<
@@ -314,8 +319,8 @@ group_load(Group g, InputIteratorT in_ptr,
314319
constexpr int ReqAlign =
315320
detail::RequiredAlignment<detail::operation_type::load,
316321
deduced_address_space>::value;
317-
if (!detail::is_aligned<ReqAlign>(in_ptr))
318-
return group_load(g, in_ptr, out, use_naive{});
322+
if (!detail::is_aligned<ReqAlign>(in_ptr, props))
323+
return group_load_impl(g, in_ptr, out, use_naive{});
319324

320325
// We know the pointer is aligned and the address space is known. Do the
321326
// optimized load.
@@ -353,20 +358,21 @@ group_load(Group g, InputIteratorT in_ptr,
353358
}
354359
}
355360
} else {
356-
return group_load(g, in_ptr, out, use_naive{});
361+
return group_load_impl(g, in_ptr, out, use_naive{});
357362
}
363+
364+
return;
358365
}
359366
}
360367

361-
// Store API span overload.
362368
template <typename Group, typename InputT, std::size_t ElementsPerWorkItem,
363369
typename OutputIteratorT,
364370
typename Properties = decltype(properties())>
365371
std::enable_if_t<detail::verify_store_types<InputT, OutputIteratorT> &&
366372
detail::is_generic_group_v<Group> &&
367373
is_property_list_v<Properties>>
368-
group_store(Group g, const span<InputT, ElementsPerWorkItem> in,
369-
OutputIteratorT out_ptr, Properties props = {}) {
374+
group_store_impl(Group g, const span<InputT, ElementsPerWorkItem> in,
375+
OutputIteratorT out_ptr, Properties props = {}) {
370376
constexpr bool blocked = detail::isBlocked(props);
371377
using use_naive =
372378
detail::merged_properties_t<Properties,
@@ -379,7 +385,7 @@ group_store(Group g, const span<InputT, ElementsPerWorkItem> in,
379385
group_barrier(g);
380386
return;
381387
} else if constexpr (!std::is_same_v<Group, sycl::sub_group>) {
382-
return group_store(g, in, out_ptr, use_naive{});
388+
return group_store_impl(g, in, out_ptr, use_naive{});
383389
} else {
384390
auto ptr = detail::get_block_op_ptr<ElementsPerWorkItem>(out_ptr, props);
385391

@@ -390,12 +396,12 @@ group_store(Group g, const span<InputT, ElementsPerWorkItem> in,
390396
access::address_space::generic_space) {
391397
if (auto local_ptr = detail::dynamic_address_cast<
392398
access::address_space::local_space>(ptr)) {
393-
return group_store(g, in, local_ptr, props);
399+
return group_store_impl(g, in, local_ptr, props);
394400
} else if (auto global_ptr = detail::dynamic_address_cast<
395401
access::address_space::global_space>(ptr)) {
396-
return group_store(g, in, global_ptr, props);
402+
return group_store_impl(g, in, global_ptr, props);
397403
} else {
398-
return group_store(g, in, out_ptr, use_naive{});
404+
return group_store_impl(g, in, out_ptr, use_naive{});
399405
}
400406
} else {
401407
using block_info = typename detail::BlockTypeInfo<
@@ -406,8 +412,8 @@ group_store(Group g, const span<InputT, ElementsPerWorkItem> in,
406412
constexpr int ReqAlign =
407413
detail::RequiredAlignment<detail::operation_type::store,
408414
deduced_address_space>::value;
409-
if (!detail::is_aligned<ReqAlign>(out_ptr))
410-
return group_store(g, in, out_ptr, use_naive{});
415+
if (!detail::is_aligned<ReqAlign>(out_ptr, props))
416+
return group_store_impl(g, in, out_ptr, use_naive{});
411417

412418
std::remove_const_t<remove_decoration_t<
413419
typename std::iterator_traits<OutputIteratorT>::value_type>>
@@ -424,10 +430,41 @@ group_store(Group g, const span<InputT, ElementsPerWorkItem> in,
424430
sycl::bit_cast<block_op_type>(values));
425431
}
426432
} else {
427-
return group_store(g, in, out_ptr, use_naive{});
433+
return group_store_impl(g, in, out_ptr, use_naive{});
428434
}
429435
}
430436
}
437+
} // namespace detail
438+
439+
// Load API span overload.
440+
template <typename Group, typename InputIteratorT, typename OutputT,
441+
std::size_t ElementsPerWorkItem,
442+
typename Properties = decltype(properties())>
443+
std::enable_if_t<detail::verify_load_types<InputIteratorT, OutputT> &&
444+
detail::is_generic_group_v<Group> &&
445+
is_property_list_v<Properties>>
446+
group_load(Group g, InputIteratorT in_ptr,
447+
span<OutputT, ElementsPerWorkItem> out, Properties props = {}) {
448+
static_assert(std::is_pointer_v<InputIteratorT> ||
449+
!Properties::template has_property<alignment_key>(),
450+
"group_load requires a pointer if alignment property is used");
451+
detail::group_load_impl(g, in_ptr, out, props);
452+
}
453+
454+
// Store API span overload.
455+
template <typename Group, typename InputT, std::size_t ElementsPerWorkItem,
456+
typename OutputIteratorT,
457+
typename Properties = decltype(properties())>
458+
std::enable_if_t<detail::verify_store_types<InputT, OutputIteratorT> &&
459+
detail::is_generic_group_v<Group> &&
460+
is_property_list_v<Properties>>
461+
group_store(Group g, const span<InputT, ElementsPerWorkItem> in,
462+
OutputIteratorT out_ptr, Properties props = {}) {
463+
static_assert(std::is_pointer_v<OutputIteratorT> ||
464+
!Properties::template has_property<alignment_key>(),
465+
"group_store requires a pointer if alignment property is used");
466+
detail::group_store_impl(g, in, out_ptr, props);
467+
}
431468

432469
// Load API scalar.
433470
template <typename Group, typename InputIteratorT, typename OutputT,
@@ -437,7 +474,10 @@ std::enable_if_t<detail::verify_load_types<InputIteratorT, OutputT> &&
437474
is_property_list_v<Properties>>
438475
group_load(Group g, InputIteratorT in_ptr, OutputT &out,
439476
Properties properties = {}) {
440-
group_load(g, in_ptr, span<OutputT, 1>(&out, 1), properties);
477+
static_assert(std::is_pointer_v<InputIteratorT> ||
478+
!Properties::template has_property<alignment_key>(),
479+
"group_load requires a pointer if alignment property is used");
480+
detail::group_load_impl(g, in_ptr, span<OutputT, 1>(&out, 1), properties);
441481
}
442482

443483
// Store API scalar.
@@ -448,7 +488,11 @@ std::enable_if_t<detail::verify_store_types<InputT, OutputIteratorT> &&
448488
is_property_list_v<Properties>>
449489
group_store(Group g, const InputT &in, OutputIteratorT out_ptr,
450490
Properties properties = {}) {
451-
group_store(g, span<const InputT, 1>(&in, 1), out_ptr, properties);
491+
static_assert(std::is_pointer_v<OutputIteratorT> ||
492+
!Properties::template has_property<alignment_key>(),
493+
"group_store requires a pointer if alignment property is used");
494+
detail::group_store_impl(g, span<const InputT, 1>(&in, 1), out_ptr,
495+
properties);
452496
}
453497

454498
// Load API sycl::vec overload.
@@ -459,7 +503,10 @@ std::enable_if_t<detail::verify_load_types<InputIteratorT, OutputT> &&
459503
is_property_list_v<Properties>>
460504
group_load(Group g, InputIteratorT in_ptr, sycl::vec<OutputT, N> &out,
461505
Properties properties = {}) {
462-
group_load(g, in_ptr, span<OutputT, N>(&out[0], N), properties);
506+
static_assert(std::is_pointer_v<InputIteratorT> ||
507+
!Properties::template has_property<alignment_key>(),
508+
"group_load requires a pointer if alignment property is used");
509+
detail::group_load_impl(g, in_ptr, span<OutputT, N>(&out[0], N), properties);
463510
}
464511

465512
// Store API sycl::vec overload.
@@ -470,7 +517,11 @@ std::enable_if_t<detail::verify_store_types<InputT, OutputIteratorT> &&
470517
is_property_list_v<Properties>>
471518
group_store(Group g, const sycl::vec<InputT, N> &in, OutputIteratorT out_ptr,
472519
Properties properties = {}) {
473-
group_store(g, span<const InputT, N>(&in[0], N), out_ptr, properties);
520+
static_assert(std::is_pointer_v<OutputIteratorT> ||
521+
!Properties::template has_property<alignment_key>(),
522+
"group_store requires a pointer if alignment property is used");
523+
detail::group_store_impl(g, span<const InputT, N>(&in[0], N), out_ptr,
524+
properties);
474525
}
475526

476527
#else

0 commit comments

Comments
 (0)