Closed
Description
One example: https://buildkite.com/julialang/flux-dot-jl/builds/1914#c62d9761-ab7f-415a-b995-51552eb2b1e5
I could not repro this locally on 2 separate GPU machines, master and 2 threads (same number Buildkite uses). Oddly, it's succeded twice (once on trying and once on staging), but fails with the same result every time, which makes me suspect some environmental discrepancy between workers.