-
Notifications
You must be signed in to change notification settings - Fork 473
Followups to the new subquery guard #32994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Followups to the new subquery guard #32994
Conversation
src/expr/src/relation/func.rs
Outdated
// Note that we can't get a 0 count here, since the `Reduce count` below us is not | ||
// an SQL `count`, but an MIR `count`, which has an empty output on an empty input. | ||
// If we get a negative count, then we have a negative accumulation error somewhere. | ||
soft_assert_or_log!( | ||
count >= 1, | ||
"GuardSubquerySize encountered invalid count: {}", | ||
count | ||
); | ||
if count != 1 { | ||
// TODO(ggevay): Add count to error msg after the old subquery code is gone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the value of this, or of extending the error message. Especially extending the error message (do you mean error variant?) as this will inflate the resource requirements of maintaining the errors, even in non-faulty operation (each distinct count will need to be maintained separately). This is a net-new test, untested in the version of the code prior to the new subquery guard, already tested in the operator that produces the count in the first place, and seems to be largely unactionable (other than pinging you to ask what it might mean). Is it very important to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the TODO comment:
Especially extending the error message (do you mean error variant?)
Yes, I meant adding it to the EvalError variant that we are creating here, and then also showing it when we print the eval error as a message to the user.
as this will inflate the resource requirements of maintaining the errors
Well, it's a tradeoff: It could help users debug their queries, but it would increase the resource requirements a bit when there are many distinct counts.
even in non-faulty operation (each distinct count will need to be maintained separately)
Well, in the happy path the only distinct count should be 1.
But it's not too important. I've removed the todo.
But the soft_assert_or_log
still seems good to me:
This is a net-new test, untested in the version of the code prior to the new subquery guard,
It would have been inconvenient/impossible to test this in the old subquery guard, but it's easy and cheap with the new subquery guard. I consider this to be an advantage of the new subquery guard.
already tested in the operator that produces the count in the first place,
You mean the Reduce aggregates=[count(*)]
? It seems this Reduce variant doesn't test for negative accumulations. (I'm not sure whether this was just an oversight, or it would be difficult to add the test.) For example, in the following query the count(*)
has a negative input, but the query doesn't soft_panic, or log any errors, or show a user-level error:
create table t(x int);
insert into t values (7);
select count(*) from (select * from t), repeat_row(-3);
However, in a scalar subquery situation, there is another operator somewhat nearby that does test for negative accumulations:
create table t(x int);
insert into t values (7);
select 5, (select 1 from (select * from t), repeat_row(-3));
This produces ERROR: Evaluation error: internal error: Non-positive multiplicity in DistinctBy
. This is the plan:
Optimized Plan
----------------------------------------------
Explained Query: +
With +
cte l0 = +
CrossJoin type=differential +
ArrangeBy keys=[[]] +
Project () +
ReadStorage materialize.public.t+
ArrangeBy keys=[[]] +
Constant +
- (() x -3) +
cte l1 = +
Union +
Map (1) +
Get l0 +
Project (#1) +
FlatMap guard_subquery_size(#0) +
Reduce aggregates=[count(*)] +
Get l0 +
Return +
Project (#1, #0) +
Map (5) +
Union +
Get l1 +
Map (null) +
Union +
Negate +
Distinct project=[] + <---- Comes from this
Project () +
Get l1 +
Constant +
- () +
+
Source materialize.public.t +
But maybe in a more complicated situation the Distinct
would be further away or would be completely absent.
and seems to be largely unactionable (other than pinging you to ask what it might mean). Is it very important to add?
I mean, it gives slightly more info on what is going on when we see it: even when something else also catches the negative accumulation, the new soft_assert says that a subquery guard also saw the negative accumulations. I think we should have as many negative accumulation checks as possible (without badly impacting performance) throughout the codebase, to help with localizing where a negative accumulation is originating from. The proposed soft_assert doesn't give us any groundbreaking debugging capabilities, but it also seems very cheap, so why not add it?
Btw. I've now modified it from a soft_assert
to just do tracing::error!
, because this is what we do in other negative accumulation checks.
7bd97d7
to
9558efa
Compare
9558efa
to
0f60cdb
Compare
These are some minor follow-ups to #32978
See individual commits.
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.