Skip to content

[SM 6.9] Fix OuterProductAccumulate FP32 Accumulator case in ExecTest. #7482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

VladM1076
Copy link

The switch that sets SrcEltSize and DestEltSize is missing an FP32 case.
This results in the matrix buffer not being initialized with all 1.0s and causes tests to fail due to expected result being off by -1.0.

Verified correctness with NVIDIA internal driver build.

Copy link
Contributor

github-actions bot commented May 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@VladM1076 VladM1076 changed the title [SM 6.9} Fix OuterProductAccumulate FP32 Accumulator case in ExecTest. [SM 6.9] Fix OuterProductAccumulate FP32 Accumulator case in ExecTest. May 24, 2025
@damyanp damyanp enabled auto-merge (squash) May 24, 2025 00:34
@damyanp damyanp merged commit d73a9f5 into microsoft:staging-sm6.9 May 24, 2025
13 checks passed
@github-project-automation github-project-automation bot moved this from New to Done in HLSL Roadmap May 24, 2025
@VladM1076
Copy link
Author

TY for quick review, have a good weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants