Skip to content

Encapsulate metadata for literals on to a FieldMetadata structure #16317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 11, 2025

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 7, 2025

Which issue does this PR close?

Rationale for this change

As discussed on #16170 (comment), we made an initial PR to add metadata to Expr::Literal as part of DataFusion 48, but the API could use a bit of fine tuning to:

  1. Encapsulate the details more (so we don't have to migrated between different Map types, for example)
  2. Make cloning faster

What changes are included in this PR?

  1. Encapsulate the representation of metadata into a struct FieldMetadata
  2. Wrap the inner with an Arc to make it fast to clone

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate catalog Related to the catalog crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate labels Jun 7, 2025
@alamb alamb force-pushed the alamb/maybe_better_metadata branch from 9752ed6 to a916b4c Compare June 7, 2025 11:03
@github-actions github-actions bot removed documentation Improvements or additions to documentation sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate catalog Related to the catalog crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate labels Jun 7, 2025
@alamb alamb changed the title Alamb/maybe better metadata Encapsulate metadata for literals on to a FieldMetadata structure Jun 7, 2025
@alamb alamb marked this pull request as draft June 7, 2025 11:05
@alamb
Copy link
Contributor Author

alamb commented Jun 7, 2025

I also found we can further unify the metadata handling for Expr::Alias as well, but would like to propose doing that as a follow on PR:

/// A constant value along with associated metadata
Literal(ScalarValue, Option<BTreeMap<String, String>>),
/// A constant value along with associated [`FieldMetadata`].
Literal(ScalarValue, Option<FieldMetadata>),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this PR is to put all metadata handling in a Struct to make it easier to work with (and more efficient)

metadata
.map(|m| m.into_iter().collect::<HashMap<String, String>>()),
)))
Ok(Transformed::yes(lit_with_metadata(utf8_val, metadata)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pretty good example of the kind of simplification this PR allows (don't have to translate back/forth between HashMap / BTreeMap in various places)

false => Some(new_metadata),
};

let metadata = metadata.as_ref().map(|m| FieldMetadata::from(m.clone()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb alamb marked this pull request as ready for review June 7, 2025 11:38
@alamb alamb requested a review from timsaucer June 7, 2025 11:49
Copy link
Contributor

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This is a very nice cleanup of the prior code.

@alamb alamb merged commit e4166b3 into apache:main Jun 11, 2025
28 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jun 11, 2025

Here is a follow on PR to also use the same structure in Expr::Alias (which also had metadata, but was stored as a HashMap)

@alamb alamb deleted the alamb/maybe_better_metadata branch June 11, 2025 04:14
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants