You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is easier to implement in workers rather than in the agent because
we don't need to keep track of the run count or concatenate logs within
the database. It does mean we get slightly weaker fault isolation
between runs (e.g. if one machine is having a bad day or low on memory
or whatever), but this is still an improvement.
This also means that the time taken per-regression increases, though an
exact factor is hard to work out (we never retry baseline builds, and we
only retry the first failed step...). Somewhere less than 5x though. In
practice the expectation is that most of our runs have very few
regressions (order of 1000 failed steps) compared to ~500,000 total
steps executed. That is ~0.2%. So even a large increase here is likely
to be acceptable.
If there's a large amount of regressions a single crater run may run
really slowly -- but probably it is also not very useful as a run, so we
may want some early-exit clause or confirmation step in that case
anyway. For now not too worried about that case.
0 commit comments