You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 28, 2023. It is now read-only.
Regular synchronization should never appear underneath a thread mapping
since the synchronization should be performed by all threads and
the mapping to threads may leave some thread instances unmapped.
Inserting reduction synchronization was apparently deemed safe
because the partial tile separation makes sure only complete
blocks are mapped to reductions.
However, by having the synchronization inside the mapping,
the isl AST generator may generate tests outside this synchronization
that involve thread identifiers (even if it is known to
the user that those same conditions could be represented
without involving thread identifiers, in combination with
other constraints in the code).
Insert the synchronization outside the mapping to prevent
this from happening. This also means that the reduction
member no longer needs to be split off, such that
the thread mapping now always corresponds to a single band.
Note that while the partial tile separation makes sure
that only complete blocks are mapped to reductions,
multiple such complete blocks may still get mapped
by the thread mapping, including in the parallel
directions. The current reduction handling does not
support this as it stores the partial reductions
in a single (per-thread) scalar variable.
The band mapped to threads therefore needs to be tiled
first such that it contains exactly one complete block
in the parallel directions.
0 commit comments