Check test cases with measurements #2161
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With the new design, it will be possible to backfill results into the DB, for example if you ask on a PR that you want to see results for the cranelift backend (which is not benchmarked by default), the collector will go back and actually backfill cranelift backend data for the parent master commit.
To support that, we need to expand the notion of a benchmark being "done". Right now, we record a
(artifact, benchmark_name)
tuple into the DB (called a step) when a benchmark begins, and then if we ever encounter the same tuple again, we don't benchmark it again. That's not ideal, because if an error happened and no data was generated, you won't be able to retry the collection without removing everything for the given artifact from the DB. And mainly, you cannot backfill more results (e.g. by running only Debug first, and then backfilling Opt, which is useful also for local experiments).This PR expands the concept of a benchmark being done by actually checking which compile-time test cases are present in the DB. We cheat a bit to have better perf - if there is at least one recorded statistic in the DB for a given test case, we consider it to be done (so we essentially ignore missing iterations, but that should be a niche edge case).
Even though this logic is mostly useful for the new scheme, which is not implemented yet, I decided to also implement it for the current benchmarking logic, because it's useful for local experiments.
Best reviewed commit by commit.