Skip to content

Add a join_core_yielding operator #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Apr 26, 2023

This PR poses an alternative to #389 that doesn't require changing the way effort is counted in the join operator. Instead it adds a JoinCore::join_core_yielding operator that allows specifying a yield_function to control the join's yield behavior, similarly to how the half join operator is configurable. The yield_function enables yielding based on time and number of produced records.

The yield_function replaces the previous fueling concept used by the join operator. Higher-level join operators that don't explicitly specify a yield_function still have the old behavior of yielding after 1 million produced records, so backwards-compatibility is maintained for all but direct users of join_core_internal_unsafe.

Some additional care is taken to ensure the yield_function is only checked after the join has made some progress. This is to avoid stuck joins caused by overly aggressive yield_functions. However, nothing prevents users from shooting themselves in the foot by specifying an overly lenient yield_function (e.g. one that always returns false) and then potentially running into OOMs. The current implementation can easily be modified to enforce a yield when the effort reaches a hardcoded value, if we think this safeguard would be valuable.

This commit adds a `JoinCore::join_core_yielding` operator that allows
specifying a `yield_function` to control the join's yield behavior,
similarly to how the half join operator is configurable. The
`yield_function` enables yielding based on time and number of produced
records.

The `yield_function` replaces the previous fueling concept used by the
join operator. However, higher-level join operators that don't
explicitly specify a `yield_function` still have the old behavior of
yielding after 1 million produced records, so backwards-compatibility is
maintained for all but direct users of `join_core_internal_unsafe`.

Some additional care is taken to ensure the `yield_function` is only
checked after the join has made some progress. This is to avoid stuck
joins caused by overly aggressive `yield_function`s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant