What is your question?
Dear cutlass team,
Happy new year!
I wanna consult the question as the title. I only found that the shared memory ldsm read is done by MmaTensorOpMultiplicandTileIterator. To my knowledge, the shared memory write by swizzled layout w/o bank conflict should occur in RegularTileAccessIterator, which but I think did not implement it. So Could you pls guide me the code location where shared memory write by swizzled layout occurs in cutlass 2.x?
Thanks.