Skip to content

Why not [RSC] but [C/64, R, S, 64] in kloop of conv implicit gemm? #1797

@liuqi123123

Description

@liuqi123123

suppose R=2, S=2, C=128, ThreadBlockShape=<128,128,64>.
When main loop in implicit GEMM_K, the memory access sequence in cutlass will be r=0,s=0,c=0-63, r=0,s=1,c=0-63, r=1,s=0,c=0-63, r=1,s=1,c=0-63, r=0,s=0,c=63-127, r=0,s=1,c=63-127, r=1,s=0,c=63-127, r=1,s=1,c=63-127.
But why not r=0,s=0,c=0-63, r=0,s=0,c=63-127, r=0,s=1,c=0-63, r=0,s=1,c=63-127 r=1,s=0,c=0-63, r=1,s=0,c=63-127, r=1,s=1,c=0-63, r=1,s=1,c=63-127, Isn't it a better memory access strategy to access C first?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions