You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #4368
X-link: facebookresearch/FBGEMM#1436
Adding two shapes for the OC OBA model to better the perf using the triton fp8 non-persistent kernel, as suggested by the [log](https://www.internalfb.com/intern/paste/P1843910441/).
By which the triton kernel almost on-pars the torch rowwise:
|fp8 kernel|Flops|Time per iter|QPS
|pytorch rowwise|304.07|35.23ms|87205.84
|triton(without added shapes)|292.83|36.58ms|83982.15
|triton(with the added shapes)|302.63|35.39ms|86793.45
Reviewed By: njriasan, karthik-man
Differential Revision: D76631650
fbshipit-source-id: 4a1324302f5ca635d801242d0cca205a33f41c94
0 commit comments