-
Notifications
You must be signed in to change notification settings - Fork 97
Support mma.sync.aligned.m16n8k16.row.col.f16.f16.f16.f16 #2943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mma.sync.aligned.m16n8k16.row.col.f16.f16.f16.f16 #2943
Conversation
Support mma.sync.aligned.m16n8k16.row.col.f16.f16.f16.f16
more enhancement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// for (int j = 0; j < 4; j++) { | ||
// *d[0] += | ||
// static_cast<CDType>(ra[j]) * static_cast<CDType>(rb[j]); | ||
// *d[1] += static_cast<CDType>(ra[j]) * | ||
// static_cast<CDType>(rb[j + 4]); | ||
// *d[2] += static_cast<CDType>(ra[j + 4]) * | ||
// static_cast<CDType>(rb[j]); | ||
// *d[3] += static_cast<CDType>(ra[j + 4]) * | ||
// static_cast<CDType>(rb[j + 4]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove those line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// fragments and adds it to the corresponding D matrix fragment d0 | ||
// += row0{ a0, a1, a2, a3 } * col0{ b0, b1, b2, b3 } d1 += row0{ | ||
// a0, a1, a2, a3 } * col1{ b0, b1, b2, b3 } d2 += row1{ a0, a1, | ||
// a2, a3 } * col0{ b0, b1, b2, b3 } d3 += row1{ a0, a1, a2, a3 } * | ||
// col1{ b0, b1, b2, b3 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we format those lines? like line 2662
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Minor fixings.
Fix LIT test and refine the comments
mma.sync.aligned.m16n8k16.row.col.f16.f16.f16.f16.
LIT test and E2E test (SYCLomatic-test:#935) can pass.