Skip to content

mz_join: efficient linear scan through times #33085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Jul 20, 2025

This PR introduces mz_join_core_v2, an evolution of mz_join_core that's meant to resolve a long-standing performance issue around processing input batches with a large number of distinct times. mz_join_core's match generation is quadratic in the number of distinct times, which often causes dataflows to struggle when they are expected to join collections with deep histories.

DD's join implements a more efficient match generation strategy that's linear in the number of distinct times. The reason mz_join_core doesn't employ that too is that mz_join_core wants to yield often to ensure responsiveness, which would require keeping self-referential state between operator invocations, a thing that's notoriously difficult to do in Rust.

mz_join_core_v2 extends mz_join_core by DD's the linear scan strategy. It solves the issue of self-referential state by wrapping the match generation logic into an async fn, which produces self-referential Futures we can store in the operator state. The operator can then poll these futures to advance the pending work, until it determines the need to yield control.

We implement mz_join_core_v2 as a new join variant, to enable a slow rollout. Once we are confident that it behaves correctly and doesn't cause performance regressions, we can remove mz_join_core.

Motivation

  • This PR implements a known-desirable feature.

Closes https://github.com/MaterializeInc/database-issues/issues/6777

Tips for reviewers

The diff is large, but the upper parts of mz_join_core_v2.rs can mostly be skipped because they just copy mz_join_core.rs. The first commit verbatim copies the latter into the former, so to see what parts actually changed, you can exclude the first commit from the diff.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje force-pushed the mz_join_v2 branch 4 times, most recently from 8f02418 to 5e1d9e3 Compare July 20, 2025 13:04
This commit copies the implementation of `mz_join_core` into a v2
version, without making any changes yet. The v2 is gated behind the
`enable_mz_join_core_v2` feature flag and switched on by default in CI.

`mz_join_core_v2` should become the only version in time, but we add it
as a v2 initially to derisk the rollout.
@teskje teskje marked this pull request as ready for review July 21, 2025 08:53
@teskje teskje requested review from a team as code owners July 21, 2025 08:53
@teskje teskje requested review from antiguru and frankmcsherry July 21, 2025 08:53
// of distinct times is small.
if self.history1.edits.len() < 10 || self.history2.edits.len() < 10 {
self.join_key_simple(key);
yield_now().await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is a call to yield_now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very I hope! The only thing it really does is check a bool and set a bool. We pass control back to Work::process here, where the yield_fn is invoked, which is probably more expensive.

If the yielding turns out to be bad for performance, we have the option of passing the yield_fn into the future and checking it here. I had that version before (hence the stale comment above) but reverted to the current form because it's simpler and doesn't seem to impact performance (according to our feature benchmarks at least). Passing the yield_fn into the future requires wrapping it into an Rc, and also requires passing the start_time in.

@antiguru antiguru requested a review from Copilot July 22, 2025 05:05
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces mz_join_core_v2, a new join implementation that optimizes performance for input batches with many distinct times by implementing linear scan through times rather than the quadratic approach used in the current mz_join_core. The implementation uses async/await to manage self-referential state that was previously difficult to handle in Rust.

  • Adds mz_join_core_v2 module with async-based linear time scan strategy
  • Updates configuration system to support the new join variant selection
  • Enables the new implementation by default in minimal system parameters

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
mz_join_core_v2.rs New join implementation with async futures and linear time scan
mz_join_core.rs Updated documentation to reference the new v2 implementation
linear_join.rs Integration of v2 implementation with configuration-based selection
join.rs Module declaration for the new v2 implementation
dyncfgs.rs New configuration flag for enabling mz_join_core_v2
action.py Added v2 config to parallel workload allowlist
init.py Enabled v2 by default in minimal system parameters
Comments suppressed due to low confidence (1)

src/compute/src/render/join/mz_join_core_v2.rs:602

  • [nitpick] The method call update on produced is misleading as it's actually decrementing the counter. Consider using a more descriptive method name or adding a comment explaining why the counter is being reduced.
            self.produced.update(|x| x - recovered);

@teskje teskje changed the title mz_join: efficient linar scan through times mz_join: efficient linear scan through times Jul 22, 2025
teskje added 4 commits July 22, 2025 11:38
This commit adjusts the doc comments of `mz_join_core` and
`mz_join_core_v2`, updating them to the current realities and stating
future intentions.
This commit introduces a new `Work` type in `mz_join_core_v2`, to
encapsulate the logic of keeping track of pending work and how much to
process in each step.
This commit changes how `mz_join_core_v2::Work` keeps pending work.
Instead of storing `Deferred` objects, it stores `Future`s, which is
polls to make progress on the pending work. The `work` method becomes
and async method, allowing us to yield at arbitrary points, without
needing to worry about self references held across yield points.
This commit changes `mz_join_core_v2` to use DD's strategy for
generating join matches. For small inputs, a simple strategy is used
that is quadratic in the number of distinct times, whereas for large
inputs updates are sorted by time, so we can perform a linear scan over
time.

The helper data structures, `EditList` and `ValueHistory`, are lifted
from DD, but simplified a bit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants