Skip to content

Fix in list round trip in df proto #16744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 12, 2025
Merged

Conversation

XiangpengHao
Copy link
Contributor

@XiangpengHao XiangpengHao commented Jul 10, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Credits:

  1. @ding-young for identifying the root cause, i.e., index should be applied on projection schema, not the full schema
  2. Claude for fixing it

I looked at the patch and can't produce anything better than Claude

(add XiangpengHao/liquid-cache#293 for record)

@github-actions github-actions bot added the proto Related to proto crate label Jul 10, 2025
@XiangpengHao
Copy link
Contributor Author

Ok I run a bisect and find the code is broken since #15769, I guess it's because the filter is moved to a different sub-structure.

@adriangb can you take a look if this is the right way to fix it?

@NGA-TRAN
Copy link
Contributor

NGA-TRAN commented Jul 11, 2025

Thanks @XiangpengHao for the fix. Could you also run the tests in this PR? The deserialization bug only happens to one tpc-h query (q16) but when I run roundtrip tests, only 4 out of 22 queries pass. Hopefully, this fix will also solve that

@adriangb
Copy link
Contributor

adriangb commented Jul 11, 2025

@adriangb can you take a look if this is the right way to fix it?

I took an initial look and... I'm a bit stumped. I don't fully understand where this is running or how, I haven't looked too deeply into proto serialization of plans and I don't know if DataDog is doing that before or after optimizer rules are applied, etc.
What I'll say about #15769 is that it shouldn't have changed the structure of FilterExec. It just moves the filters from FilterExec into ParquetSource(possibly deleting the FilterExec). In other words: the plan is being rewritten in the optimizer pass, but I think the final state of the ExecutionPlan tree should not have changed in terms of serialization.

But I think I need to take another look at this.

I'll be completely AFK for the the weekend until Tuesday and have an early end to my day today so if I don't get back to give a deeper analysis for this I suggest that as long as:

  1. This fixes the tests.
  2. Does not break any other tests (especially Add serialization/deserialization and round-trip tests for all tpc-h queries #16742).

It's worth just proceeding with it and I can try to find a deeper meaning to it next week.

Might also be worth asking Claude to explain why this is the fix to give us pointers into what to look at?

@NGA-TRAN
Copy link
Contributor

@XiangpengHao: If you believe the round-trip bug reproduced in test_round_trip_tpch_queries from PR #16742 is distinct, we can file a separate issue and tackle it independently. @LiaCastaneda is interested and may be available to work on a fix.

@XiangpengHao
Copy link
Contributor Author

@XiangpengHao: If you believe the round-trip bug reproduced in test_round_trip_tpch_queries from PR #16742 is distinct, we can file a separate issue and tackle it independently. @LiaCastaneda is interested and may be available to work on a fix.

I think it's a different bug, see #16742 (comment)

@LiaCastaneda feel free to fix it!

@XiangpengHao
Copy link
Contributor Author

XiangpengHao commented Jul 11, 2025

@adriangb can you take a look if this is the right way to fix it?

I took an initial look and... I'm a bit stumped. I don't fully understand where this is running or how, I haven't looked too deeply into proto serialization of plans and I don't know if DataDog is doing that before or after optimizer rules are applied, etc. What I'll say about #15769 is that it shouldn't have changed the structure of FilterExec. It just moves the filters from FilterExec into ParquetSource (possibly deleting the FilterExec). In other words: the plan is being rewritten in the optimizer pass, but I think the final state of the ExecutionPlan tree should not have changed in terms of serialization.

But I think I need to take another look at this.

I'll be completely AFK for the the weekend until Tuesday and have an early end to my day today so if I don't get back to give a deeper analysis for this I suggest that as long as:

  1. This fixes the tests.
  2. Does not break any other tests (especially Add serialization/deserialization and round-trip tests for all tpc-h queries #16742).

It's worth just proceeding with it and I can try to find a deeper meaning to it next week.

Might also be worth asking Claude to explain why this is the fix to give us pointers into what to look at?

Agree! I think it's not that #15769 introduces the bug, it just surfaces up the bug. I plan to dig deeper into this over the weekend.

Comment on lines 1741 to +1742
// Failing due to https://github.com/apache/datafusion/pull/16662
#[ignore]
// Fixed: Column index mismatch during protobuf deserialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Failing due to https://github.com/apache/datafusion/pull/16662
#[ignore]
// Fixed: Column index mismatch during protobuf deserialization
// Test case for https://github.com/apache/datafusion/pull/16662

@alamb
Copy link
Contributor

alamb commented Jul 11, 2025

Thank you @XiangpengHao

FYI @NGA-TRAN and @LiaCastaneda

@alamb alamb merged commit ce3f62a into apache:main Jul 12, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TPC-H Q16 fails during deserialization
4 participants