Skip to content

[SPARK-52311][SQL] Fix bug with multiple self-references with UnresolvedSubqueries in rCTEs #51022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Pajaraja
Copy link
Contributor

@Pajaraja Pajaraja commented May 26, 2025

What changes were proposed in this pull request?

Change the way the output is handled in the case where we give the names of the columns in the rCTE as a SubqueryAlias is handled. Now we make sure that anchor gives the correct names by pushing the UnresolvedSubqueryColumnAlias into it, and also make sure that the UnionLoopRef has this output as well.

Why are the changes needed?

Currently rCTEs don't behave properly in case when the anchor references the same column multiple times in the anchor, leading to wrong things being identified in the recursion.
For example this rCTE:

WITH RECURSIVE tmp(x) AS (
values (1), (2), (3), (4), (5)
), rcte(x, y) AS (
SELECT x, x FROM tmp WHERE x = 1
UNION ALL
SELECT x + 1, x FROM rcte WHERE x < 5
)
SELECT * FROM rcte;

Will return:

1 1
2 2
3 3
4 4
5 5

Instead of:

1 1
2 1
3 2
4 3
5 4

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests in cte-recursion.sql golden file.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant