-
-
Notifications
You must be signed in to change notification settings - Fork 611
unbreak CI for now #1822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
unbreak CI for now #1822
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't really help here since the error bounds are pretty high and the broken test is already specific to ConvTranspose + selu. Can we specify a kind of failure we expect? Say we expect the test to fail but not error?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, because the test doesn't always fail! It was a choice between this and skipping the test entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather avoid having high error tolerances since that isn't very helpful in the real world, and and it's unlikely an error would be raised in this code path (although I'd rather retain the test). Something that doesn't error and gives inaccurate answers would be hard to debug! Can we compare against a standard (say tf/ pytorch) implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but that doesn't address the issue of very high variance in results. The main problem is that we (or at least I) can't figure out where that variance is coming from. It could be something deep within the bowels of cuDNN, and since the forward pass of
ConvTranspose
is literally 1 cuDNN call + broadcasted bias add + broadcasted activation, there'd be very little we could do about that.All that said, I'm happy to change it to a
@test_skip
if you feel that's more appropriate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the issue that we might see Red ci spuriously if this test passes as it is on master? I don't think we've encountered that very frequently with the current setup right?
I'm fairly certain that the underlying issue would be in CUDA/ cudnn, and that would be pretty out of our hands at that point. To fix this we'd need Julia kernels which might not be the worst idea but seeing as the motivation is to fix one combination of conv and activation, it's fair to say that it would be low priority with little overall benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are to let this test be here with wide tolerance it would be good to know what CUDA deps are referred to in the comment and what we should do to be alerted when this combination works to a decent degree again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the thing, I don't know because I couldn't repro anything. It may well be that the CUDA deps are a red herring and the problem lies elsewhere (say with GPUCompiler + the new LLVM + compiler changes on Julia 1.7). I've added a commit with the last
CUDA.versioninfo()
output I got out of Buildkite.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts? Concerns? Again, I'm happy to turn this into a
@test_skip
with the previous tolerance to get the PR merged.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a functional difference here between
@test_skip
and the adjusted tolerances? They are so wide, I would expect a 200% relative tolerance to be tantamount to skipping any testing. So, let's just change to@test_skip
and merge.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.