-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Labels
Milestone
Description
Currently, CUTLASS only implements a specialization of atomic_add
for half2
, but not nv_bfloat162
. This in turn limits BlockStripedReduce to specialize in half2
but not nv_bfloat162
.
Is there any reason not to provide a specialization for nv_bfloat162
? It looks like a very simple change, but maybe I'm missing something. Thanks in advance for the help!