Skip to content

Performance doubles by only changing one line of code #18

@xiefan46

Description

@xiefan46

Hi xiandong,

Thanks for providing this amazing tutorial! Recently I am working on reduce0 and I found that I can double the performance of reduce_v0_baseline.cu kernel by simply changing a blockDim.x into THREAD_PER_BLOCK in the for loop

before

Image

profile result:

Image

after

Image

profile result:

Image

I guess this is because of loop unrolling? It's quite interesting that a simple change makes a big difference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions