You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking through the CUTLASS 3.9 source code and how gemm of mx types works on Blackwell hardware.
First off, I find tcgen05.mma.mxf8f6f4 supports {8,16,24,…,256} for 1SM mma and {16, 32, 48, ..., 256} for 2SM mma, which is reflected in CUTLASS API that those Ns are supported for non-blockscaled types (e.g., f8f6f4) in Collective builders. However, I can't see why Ns are restricted so only a few of those are supported like:
sm100_make_blockscaled_1sm_trivial_tiled_mma() {
// ...// Do not allow a tiled MMA N mode > 1, as that is not reasonable.constexprint N = cute::size<1>(TileShape_MNK{});
static_assert(N == 64 || N == 128 || N == 192 || N == 256, "Invalid TileShape_N.");
I'm fine-tuning my gemm library with a large range of N values that fits best for fully occupying SMs. So this restriction comes short for this end.
Plus, is it a good practice to place a weight into B and an activation into A in a FC layer?
The text was updated successfully, but these errors were encountered:
There is no fundamental limitation coming from the MMA itself, however, these tile shape restrictions are a result of the scale factor layouts in gmem and their interactions with TMA loads.
What is your question?
I'm looking through the CUTLASS 3.9 source code and how gemm of mx types works on Blackwell hardware.
First off, I find
tcgen05.mma.mxf8f6f4
supports {8,16,24,…,256} for 1SM mma and {16, 32, 48, ..., 256} for 2SM mma, which is reflected in CUTLASS API that those Ns are supported for non-blockscaled types (e.g., f8f6f4) in Collective builders. However, I can't see why Ns are restricted so only a few of those are supported like:I'm fine-tuning my gemm library with a large range of N values that fits best for fully occupying SMs. So this restriction comes short for this end.
Plus, is it a good practice to place a weight into B and an activation into A in a FC layer?
The text was updated successfully, but these errors were encountered: