v2.2.3 - SplitU and WorkGroupMapping
·
4964 commits
to develop_deprecated
since this release
SplitU
If you have large summations but small C tensor, then you can create extra parallelism by splitting up the summation; This allows smaller C tensors to fill up larger GPUs.
WorkGroupMapping
Changes which work-groups operate on which tiles of tensor C. This can help performance by improving caching.