Skip to content

Make dict ID only an IPC concern #7929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

brancz
Copy link
Contributor

@brancz brancz commented Jul 14, 2025

Which issue does this PR close?

Does not yet close, but contributes towards:

Rationale for this change

See the above issues. And this is a follow up to

This was also split out from: #7467

What changes are included in this PR?

The required changes, so that nothing depends on the canonical Schema's Field containing the dict_id field anymore, so that as a follow up, the dict_id field can actually be removed from the deprecated function signatures and remove the field itself.

Are these changes tested?

All tests continue to pass.

Are there any user-facing changes?

The function signatures of these publicly facing APIs changed to provide the appropriate access to the dict ID as it is represented in the respective IPC message(s):

  • flight_data_to_arrow_batch (arrow-flight)
  • arrow_data_from_flight_data (arrow-flight; sql)
  • read_record_batch (arrow-ipc)
  • read_dictionary (arrow-ipc)
  • FileDecoder::new (arrow-ipc)

@tustvold @alamb @thinkharderdev @adriangb

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels Jul 14, 2025
@brancz brancz force-pushed the dict-id-ipc-only branch 3 times, most recently from d185110 to 6042909 Compare July 15, 2025 07:51
@brancz brancz force-pushed the dict-id-ipc-only branch from 6042909 to 8b4c8b6 Compare July 16, 2025 09:32
@brancz
Copy link
Contributor Author

brancz commented Jul 16, 2025

Actually while debugging the test failure, I realized I can split up this PR some more, by just opening the removal of the preserving dict id code paths.

@brancz
Copy link
Contributor Author

brancz commented Jul 17, 2025

Rebasing this on #7940 now.

alamb pushed a commit that referenced this pull request Jul 18, 2025
# Which issue does this PR close?

Does not yet close, but contributes towards:

- #6356 
- #5981 
- #1206

# Rationale for this change

See the above issues. And this is a follow up to

* #6711
* #6873

This was also split out from:
#7929

# What changes are included in this PR?

This removes the API to allow preserving `dict_id` set in the `Schema`'s
`Field` within arrow-ipc and arrow-flight. This is in an effort to
remove the `dict_id` field entirely and make it an IPC/flight-only
concern.

# Are these changes tested?

Yes, all existing tests continue to pass.

# Are there any user-facing changes?

Yes, these previously (in 54.0.0) deprecated functions/fields are
removed:

* `arrow_ipc::DictionaryTracker.set_dict_id`
* `arrow_ipc::DictionaryTracker::new_with_preserve_dict_id`
* `arrow_ipc::IpcWriteOptions.with_preserve_dict_id`
* `arrow_ipc::IpcWriteOptions.preserve_dict_id` (function and field)
* `arrow_ipc::schema_to_fb`
* `arrow_ipc::schema_to_bytes`
@alamb
Copy link
Contributor

alamb commented Jul 21, 2025

Close / reopen to rerun CI (I think github actions was having problems previously)

@alamb alamb closed this Jul 21, 2025
@alamb alamb reopened this Jul 21, 2025
@brancz
Copy link
Contributor Author

brancz commented Jul 22, 2025

One of the tests is actually failing correctly (one of the integration tests), and I've realized what I need to do, but I'll need a few days to get to it. I'll make sure to ping you when I've got it fixed!

@alamb alamb marked this pull request as draft July 23, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants