-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
What is your question?
// Get the appropriate blocks for this thread block
auto cta_coord = make_coord(blockIdx.x, blockIdx.y, _); // (m,n,k)
Tensor gA = local_tile(mA, cta_tiler, cta_coord, Step<_1, X,_1>{}); // (BLK_M,BLK_K,k)
Tensor gB = local_tile(mB, cta_tiler, cta_coord, Step< X,_1,_1>{}); // (BLK_N,BLK_K,k)
Tensor gC = local_tile(mC, cta_tiler, cta_coord, Step<_1,_1, X>{}); // (BLK_M,BLK_N)
I am learning "https://github.com/NVIDIA/cutlass/blob/c4e3e122e266644c61b4af33d0cc09f4c391a64b/media/docs/cute/0x_gemm_tutorial.md" but I do not know how to print the shape of gA, I tried "printf("%d, %d, %d\n", size<0>(gA), size<1>(gA), size<2>(gA));", and I get "0, 512, 0", but I should get 128, 128, 8.... why? Thanks!
I tried "printf("%d\n", gA.size());" but I get "524288"!! why?