Skip to content

Further optimize gemm #13

@MJChku

Description

@MJChku

Thanks for your great work!

I plan to work on bnn optimization as well for various application (generative model/classifier) on a powerful cpu.
I did preliminary work for a few hours to change the "micro_kernel" to use avx512, and it showed 4x speed up for simple one loop optimization (note -O3 won't do the optimization to vectorize). I wonder if you plan to work on this further ? and boost the performance further.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions