-
Notifications
You must be signed in to change notification settings - Fork 473
mz_join: efficient linear scan through times #33085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8f02418
to
5e1d9e3
Compare
This commit copies the implementation of `mz_join_core` into a v2 version, without making any changes yet. The v2 is gated behind the `enable_mz_join_core_v2` feature flag and switched on by default in CI. `mz_join_core_v2` should become the only version in time, but we add it as a v2 initially to derisk the rollout.
// of distinct times is small. | ||
if self.history1.edits.len() < 10 || self.history2.edits.len() < 10 { | ||
self.join_key_simple(key); | ||
yield_now().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is a call to yield_now
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very I hope! The only thing it really does is check a bool and set a bool. We pass control back to Work::process
here, where the yield_fn
is invoked, which is probably more expensive.
If the yielding turns out to be bad for performance, we have the option of passing the yield_fn
into the future and checking it here. I had that version before (hence the stale comment above) but reverted to the current form because it's simpler and doesn't seem to impact performance (according to our feature benchmarks at least). Passing the yield_fn
into the future requires wrapping it into an Rc
, and also requires passing the start_time
in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces mz_join_core_v2
, a new join implementation that optimizes performance for input batches with many distinct times by implementing linear scan through times rather than the quadratic approach used in the current mz_join_core
. The implementation uses async/await to manage self-referential state that was previously difficult to handle in Rust.
- Adds
mz_join_core_v2
module with async-based linear time scan strategy - Updates configuration system to support the new join variant selection
- Enables the new implementation by default in minimal system parameters
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
mz_join_core_v2.rs | New join implementation with async futures and linear time scan |
mz_join_core.rs | Updated documentation to reference the new v2 implementation |
linear_join.rs | Integration of v2 implementation with configuration-based selection |
join.rs | Module declaration for the new v2 implementation |
dyncfgs.rs | New configuration flag for enabling mz_join_core_v2 |
action.py | Added v2 config to parallel workload allowlist |
init.py | Enabled v2 by default in minimal system parameters |
Comments suppressed due to low confidence (1)
src/compute/src/render/join/mz_join_core_v2.rs:602
- [nitpick] The method call
update
onproduced
is misleading as it's actually decrementing the counter. Consider using a more descriptive method name or adding a comment explaining why the counter is being reduced.
self.produced.update(|x| x - recovered);
This commit adjusts the doc comments of `mz_join_core` and `mz_join_core_v2`, updating them to the current realities and stating future intentions.
This commit introduces a new `Work` type in `mz_join_core_v2`, to encapsulate the logic of keeping track of pending work and how much to process in each step.
This commit changes how `mz_join_core_v2::Work` keeps pending work. Instead of storing `Deferred` objects, it stores `Future`s, which is polls to make progress on the pending work. The `work` method becomes and async method, allowing us to yield at arbitrary points, without needing to worry about self references held across yield points.
This commit changes `mz_join_core_v2` to use DD's strategy for generating join matches. For small inputs, a simple strategy is used that is quadratic in the number of distinct times, whereas for large inputs updates are sorted by time, so we can perform a linear scan over time. The helper data structures, `EditList` and `ValueHistory`, are lifted from DD, but simplified a bit.
This PR introduces
mz_join_core_v2
, an evolution ofmz_join_core
that's meant to resolve a long-standing performance issue around processing input batches with a large number of distinct times.mz_join_core
's match generation is quadratic in the number of distinct times, which often causes dataflows to struggle when they are expected to join collections with deep histories.DD's join implements a more efficient match generation strategy that's linear in the number of distinct times. The reason
mz_join_core
doesn't employ that too is thatmz_join_core
wants to yield often to ensure responsiveness, which would require keeping self-referential state between operator invocations, a thing that's notoriously difficult to do in Rust.mz_join_core_v2
extendsmz_join_core
by DD's the linear scan strategy. It solves the issue of self-referential state by wrapping the match generation logic into anasync fn
, which produces self-referentialFuture
s we can store in the operator state. The operator can then poll these futures to advance the pending work, until it determines the need to yield control.We implement
mz_join_core_v2
as a new join variant, to enable a slow rollout. Once we are confident that it behaves correctly and doesn't cause performance regressions, we can removemz_join_core
.Motivation
Closes https://github.com/MaterializeInc/database-issues/issues/6777
Tips for reviewers
The diff is large, but the upper parts of
mz_join_core_v2.rs
can mostly be skipped because they just copymz_join_core.rs
. The first commit verbatim copies the latter into the former, so to see what parts actually changed, you can exclude the first commit from the diff.Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.