mz_join: efficient linear scan through times #33085

teskje · 2025-07-20T10:02:17Z

This PR introduces mz_join_core_v2, an evolution of mz_join_core that's meant to resolve a long-standing performance issue around processing input batches with a large number of distinct times. mz_join_core's match generation is quadratic in the number of distinct times, which often causes dataflows to struggle when they are expected to join collections with deep histories.

DD's join implements a more efficient match generation strategy that's linear in the number of distinct times. The reason mz_join_core doesn't employ that too is that mz_join_core wants to yield often to ensure responsiveness, which would require keeping self-referential state between operator invocations, a thing that's notoriously difficult to do in Rust.

mz_join_core_v2 extends mz_join_core by DD's the linear scan strategy. It solves the issue of self-referential state by wrapping the match generation logic into an async fn, which produces self-referential Futures we can store in the operator state. The operator can then poll these futures to advance the pending work, until it determines the need to yield control.

We implement mz_join_core_v2 as a new join variant, to enable a slow rollout. Once we are confident that it behaves correctly and doesn't cause performance regressions, we can remove mz_join_core.

Motivation

This PR implements a known-desirable feature.

Closes https://github.com/MaterializeInc/database-issues/issues/6777

Tips for reviewers

The diff is large, but the upper parts of mz_join_core_v2.rs can mostly be skipped because they just copy mz_join_core.rs. The first commit verbatim copies the latter into the former, so to see what parts actually changed, you can exclude the first commit from the diff.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

This commit copies the implementation of `mz_join_core` into a v2 version, without making any changes yet. The v2 is gated behind the `enable_mz_join_core_v2` feature flag and switched on by default in CI. `mz_join_core_v2` should become the only version in time, but we add it as a v2 initially to derisk the rollout.

src/compute/src/render/join/mz_join_core_v2.rs

antiguru · 2025-07-21T12:15:39Z

src/compute/src/render/join/mz_join_core_v2.rs

+        //       of distinct times is small.
+        if self.history1.edits.len() < 10 || self.history2.edits.len() < 10 {
+            self.join_key_simple(key);
+            yield_now().await;


How expensive is a call to yield_now?

Not very I hope! The only thing it really does is check a bool and set a bool. We pass control back to Work::process here, where the yield_fn is invoked, which is probably more expensive.

If the yielding turns out to be bad for performance, we have the option of passing the yield_fn into the future and checking it here. I had that version before (hence the stale comment above) but reverted to the current form because it's simpler and doesn't seem to impact performance (according to our feature benchmarks at least). Passing the yield_fn into the future requires wrapping it into an Rc, and also requires passing the start_time in.

Copilot

Pull Request Overview

This PR introduces mz_join_core_v2, a new join implementation that optimizes performance for input batches with many distinct times by implementing linear scan through times rather than the quadratic approach used in the current mz_join_core. The implementation uses async/await to manage self-referential state that was previously difficult to handle in Rust.

Adds mz_join_core_v2 module with async-based linear time scan strategy
Updates configuration system to support the new join variant selection
Enables the new implementation by default in minimal system parameters

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
mz_join_core_v2.rs	New join implementation with async futures and linear time scan
mz_join_core.rs	Updated documentation to reference the new v2 implementation
linear_join.rs	Integration of v2 implementation with configuration-based selection
join.rs	Module declaration for the new v2 implementation
dyncfgs.rs	New configuration flag for enabling mz_join_core_v2
action.py	Added v2 config to parallel workload allowlist
init.py	Enabled v2 by default in minimal system parameters

Comments suppressed due to low confidence (1)

src/compute/src/render/join/mz_join_core_v2.rs:602

[nitpick] The method call update on produced is misleading as it's actually decrementing the counter. Consider using a more descriptive method name or adding a comment explaining why the counter is being reduced.

            self.produced.update(|x| x - recovered);

src/compute/src/render/join/mz_join_core_v2.rs

This commit adjusts the doc comments of `mz_join_core` and `mz_join_core_v2`, updating them to the current realities and stating future intentions.

This commit introduces a new `Work` type in `mz_join_core_v2`, to encapsulate the logic of keeping track of pending work and how much to process in each step.

This commit changes how `mz_join_core_v2::Work` keeps pending work. Instead of storing `Deferred` objects, it stores `Future`s, which is polls to make progress on the pending work. The `work` method becomes and async method, allowing us to yield at arbitrary points, without needing to worry about self references held across yield points.

This commit changes `mz_join_core_v2` to use DD's strategy for generating join matches. For small inputs, a simple strategy is used that is quadratic in the number of distinct times, whereas for large inputs updates are sorted by time, so we can perform a linear scan over time. The helper data structures, `EditList` and `ValueHistory`, are lifted from DD, but simplified a bit.

teskje force-pushed the mz_join_v2 branch 4 times, most recently from 8f02418 to 5e1d9e3 Compare July 20, 2025 13:04

teskje force-pushed the mz_join_v2 branch from 5e1d9e3 to dab4f5c Compare July 21, 2025 08:06

teskje marked this pull request as ready for review July 21, 2025 08:53

teskje requested review from a team as code owners July 21, 2025 08:53

teskje requested review from antiguru and frankmcsherry July 21, 2025 08:53

antiguru reviewed Jul 21, 2025

View reviewed changes

antiguru requested a review from Copilot July 22, 2025 05:05

Copilot AI reviewed Jul 22, 2025

View reviewed changes

teskje changed the title ~~mz_join: efficient linar scan through times~~ mz_join: efficient linear scan through times Jul 22, 2025

teskje added 4 commits July 22, 2025 11:38

mz_join: adjust doc comments

df4caad

This commit adjusts the doc comments of `mz_join_core` and `mz_join_core_v2`, updating them to the current realities and stating future intentions.

mz_join_v2: introduce the Work type

e3ff8ed

This commit introduces a new `Work` type in `mz_join_core_v2`, to encapsulate the logic of keeping track of pending work and how much to process in each step.

teskje force-pushed the mz_join_v2 branch from dab4f5c to 2ffb81f Compare July 22, 2025 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mz_join: efficient linear scan through times #33085

mz_join: efficient linear scan through times #33085

Uh oh!

teskje commented Jul 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

antiguru Jul 21, 2025

Uh oh!

teskje Jul 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mz_join: efficient linear scan through times #33085

Are you sure you want to change the base?

mz_join: efficient linear scan through times #33085

Uh oh!

Conversation

teskje commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tips for reviewers

Checklist

Uh oh!

Uh oh!

antiguru Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

teskje commented Jul 20, 2025 •

edited

Loading