Skip to content

Remove error promotion from COALESCE #33103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 22, 2025

Conversation

frankmcsherry
Copy link
Contributor

We have an expression optimization that promotes literal errors around the documented evaluation order guards of COALESCE. It seems like this may be for "postgres compatibility", as postgres does a similar thing of erroring even when there are non-null values before the error (but only observed when the error is present in the query text, rather than as a result of data). We have situations where literal errors result from data, and need to guard the promotion of the error in the way that COALESCE is documented to do.

Seems hard to imagine fall-out from this, as we believe we are optimizing the performance of an error-producing case. We'll see what tests say!

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@frankmcsherry
Copy link
Contributor Author

A point worth discussing: this breaks PostgreSQL compatibility for a query like SELECT COALESCE(a, 1/0), where PostgreSQL will return an error even though "not evaluating" the 1/0. At the same time, this fixes a PostgreSQL compatibility error where it could return a correct answer and Materialize will error:

Postgres

mcsherry=# create table test (a int, b int);
CREATE TABLE
mcsherry=# insert into test values (1, 0);
INSERT 0 1
mcsherry=# select coalesce(a, 1/b) from test where b = 0;
 coalesce 
----------
        1
(1 row)

mcsherry=# 

Materialize

materialize=> create table test (a int, b int);
CREATE TABLE
materialize=> insert into test values (1, 0);
INSERT 0 1
materialize=> select coalesce(a, 1/b) from test where b = 0;
ERROR:  Evaluation error: division by zero
materialize=> 

The only way I can square this is that PostgreSQL replaces queries with errors if literal errors can occur in the query text, but then turns off that part of its brain in the course of query optimization. If we really want to respect this form of compatibility, it feels like a pass before any query transformations have occured.

@frankmcsherry frankmcsherry marked this pull request as ready for review July 21, 2025 20:53
@frankmcsherry frankmcsherry requested a review from a team as a code owner July 21, 2025 20:53
@ggevay
Copy link
Contributor

ggevay commented Jul 22, 2025

The Postgres docs say coalesce should short-circuit. So, I wouldn't try to match Postgres in those cases where it doesn't short-circuit, because these seem to be bugs in Postgres itself. So, I'd say the PR just fixes a bug where Materialize was also not short-circuiting, which is a good thing.

I did some more tests:

create table test (a int, b int);
insert into test values (1, 0);

# 1. main: ok,  PR: ok, Postgres: ok
select coalesce(a, 1/b) from test;

# 2. main: err, PR: ok, Postgres: err
select coalesce(a, 1/0) from test;

# 3. main: err, PR: ok, Postgres: ok
select coalesce(a, 1/b) from test where b = 0;

# 4. main: ok,  PR: ok, Postgres: ok
select coalesce(7, 1/b) from test;

# 5. main: ok,  PR: ok, Postgres: ok
select coalesce(7, 1/0) from test;

create table test_nonnull (a int not null, b int);
insert into test_nonnull values (1, 0);

# 1-nonnull. main: ok,  PR: ok, Postgres: ok
select coalesce(a, 1/b) from test_nonnull;

# 2-nonnull. main: ok,  PR: ok, Postgres: err
select coalesce(a, 1/0) from test_nonnull;

# 3-nonnull. main: ok,  PR: ok, Postgres: ok
select coalesce(a, 1/b) from test_nonnull where b = 0;

# 4-nonnull. main: ok,  PR: ok, Postgres: ok
select coalesce(7, 1/b) from test_nonnull;

# 5-nonnull. main: ok,  PR: ok, Postgres: ok
select coalesce(7, 1/0) from test_nonnull;


# Still buggy. main: err, PR: err, Postgres: ok
select coalesce(a, a/b + 1, a/b + 2) from test;

I'll add these in an slt and push to the PR.

Unfortunately, I can still make coalesce not short-circuit, see the last example above, where we have the plan:

                        Physical Plan                         
--------------------------------------------------------------
 Explained Query:                                            +
   →Read materialize.public.test                             +
                                                             +
 Source materialize.public.test                              +
   project=(#3)                                              +
   map=((#0{a} / #1{b}), coalesce(#0{a}, (#2 + 1), (#2 + 2)))+

We'd need to add a similar exception in MFP CSE for coalesce as we have for MSE::If:

// We should not eagerly memoize `if` branches that might not be taken.
// TODO: Memoize expressions in the intersection of `then` and `els`.
if let MirScalarExpr::If { cond, .. } = e {
return Some(vec![cond]);
}
I've added a note in the error semantics issue: https://github.com/MaterializeInc/database-issues/issues/4972 But also, this looks like an easy fix for a quick follow-up PR.

@frankmcsherry
Copy link
Contributor Author

We'd need to add a similar exception in MFP CSE for coalesce as we have for MSE::If:

This sounds worth doing and I'll take a look. I don't think we need extensive testing done yet though, in the absence of clear semantics (we had two tests for the previous version; the bug was that we didn't know what was supposed to happen). No harm, but it's more stuff we'll have to review once we do have an opinion on the error semantics (which @mgree is peeking at, so it's not wildly far-fetched).

@frankmcsherry
Copy link
Contributor Author

frankmcsherry commented Jul 22, 2025

Maybe said differently: I think the tests for "postgres compat" make sense, but we should keep them apart from "intended behavior". Having just deleted some of the "intended behavior" tests for not actually being the intended behavior, but rather inspired by postgres compat. :D

@ggevay
Copy link
Contributor

ggevay commented Jul 22, 2025

I think it's good to add tests, so that we at least know about it when the behavior changes. We'll need these tests anyway later when we do more comprehensive work on error semantics. Also, this is input to @mgree's work: These are some specific example queries that he can look at to see what his formal semantics say on them.

I've pushed the tests here: Could you please cherry-pick that commit? ggevay@0d2b6ae

Edit: Could you also add a comment above the new tests that we haven't totally decided yet whether this is the intended behavior?

@frankmcsherry
Copy link
Contributor Author

I think it's good to add tests, so that we at least know about it when the behavior changes.

So, we don't agree on this. Let's discuss async.

@frankmcsherry frankmcsherry enabled auto-merge (squash) July 22, 2025 12:46
Copy link
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@frankmcsherry frankmcsherry merged commit dd50ff0 into MaterializeInc:main Jul 22, 2025
121 checks passed
@frankmcsherry frankmcsherry deleted the de_error_coalesce branch July 22, 2025 13:10
frankmcsherry added a commit that referenced this pull request Jul 22, 2025
Observed in
#33103 (comment),
and an easy fix.

### Motivation

<!--
Which of the following best describes the motivation behind this PR?

  * This PR fixes a recognized bug.

    [Ensure issue is linked somewhere.]

  * This PR adds a known-desirable feature.

    [Ensure issue is linked somewhere.]

  * This PR fixes a previously unreported bug.

    [Describe the bug in detail, as if you were filing a bug report.]

  * This PR adds a feature that has not yet been specified.

[Write a brief specification for the feature, including justification
for its inclusion in Materialize, as if you were writing the original
     feature specification.]

   * This PR refactors existing code.

[Describe what was wrong with the existing code, if it is not obvious.]
-->

### Tips for reviewer

<!--
Leave some tips for your reviewer, like:

    * The diff is much smaller if viewed with whitespace hidden.
    * [Some function/module/file] deserves extra attention.
* [Some function/module/file] is pure code movement and only needs a
skim.

Delete this section if no tips.
-->

### Checklist

- [ ] This PR has adequate test coverage / QA involvement has been duly
considered. ([trigger-ci for additional test/nightly
runs](https://trigger-ci.dev.materialize.com/))
- [ ] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [ ] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [ ] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [ ] If this PR includes major [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note),
I have pinged the relevant PM to schedule a changelog post.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants