You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 28, 2023. It is now read-only.
The original implementation of memory promotion ignored strides in
accesses to simplify the code. isl recently introduced support for
extracting stride information from sets, making stride manipulation easy
in TC. Introduce special handling for strided support into
TensorReferenceGroup and related classes. An access is strided if the
access function has the shape
(a_i - offset_i) = 0 mod stride_i
where stride_i is a constant and offset_i is some affine expression on
the iteration domain. Use isl to compute offsets and strides in access
relations. Use this information to promote to shared memory only those
tensor elements that are actually read in case of strided accesses.
This decreases the amount of shared memory used by a kernel with such
accesses.
This also prepares for the introduction of register promotion where
accesses of each thread individually are strided with the stride equal
to the number of threads.
Note that references accessing disjoint sets of elements with strides
are not grouped even if their non-strided footprints overlap, e.g.
A[2*i] and A[2*i + 1] belong to different groups. This may decrease the
benefit of coalesced reads when copying between global and shared
memory. At the same time, it also decreases the required shared memory
size making it to promote one of the references in case where a group
with two references would not fit. The profitability of such grouping
requires further exploration.
0 commit comments