Skip to content

Conversation

scovich
Copy link
Contributor

@scovich scovich commented Sep 30, 2025

Which issue does this PR close?

Rationale for this change

Add a missing feature.

What changes are included in this PR?

Leveraging the recently added ListLikeArray trait, support all five list types when unshredding variant data.

Are these changes tested?

Yes -- all the list-related variant shredding integration tests pass now.

Are there any user-facing changes?

No.

@github-actions github-actions bot added parquet Changes to the parquet crate parquet-variant parquet-variant* crates labels Sep 30, 2025
@scovich
Copy link
Contributor Author

scovich commented Sep 30, 2025

CC @alamb @liamzwbao

Copy link
Contributor Author

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes for reviewers


//! Module for unshredding VariantArray by folding typed_value columns back into the value column.
use crate::arrow_to_variant::ListLikeArray;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice new interface, @liamzwbao !

///
/// [valid]: VariantMetadata#Validation
/// [Variant spec]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-data-for-array-basic_type3
#[derive(Debug, Clone, PartialEq)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We forgot to fix this PartialEq when we fixed the one for VariantObject.

Manual implementation below.

true
self.iter()
.zip(other.iter())
.all(|((name_a, value_a), (name_b, value_b))| name_a == name_b && value_a == value_b)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opportunistic simplification.

Comment on lines -41 to -51
($case_num:literal) => {
paste::paste! {
#[test]
fn [<test_variant_integration_case_ $case_num>]() {
all_cases()[$case_num - 1].run()
}
}
};

// Generates an error test case, where the expected result is an error message
($case_num:literal, $expected_error:literal) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opportunistic simplification: Instead of duplicating the macro definition, just use $(...)? syntax to capture (and respond to) an optional error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL -- thanks @scovich

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks (again) @scovich
😍 thank you again @scovich -- this is great

I think this PR is great evidence that the pattern introduced in

Is on the right track.

list_builder.finish();
Ok(())
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how we could test this reasonably 🤔 Maybe we can rework the tests in cast_to_variant to shred and then unshred an array and verify it survives round tripping 🤔

Comment on lines -41 to -51
($case_num:literal) => {
paste::paste! {
#[test]
fn [<test_variant_integration_case_ $case_num>]() {
all_cases()[$case_num - 1].run()
}
}
};

// Generates an error test case, where the expected result is an error message
($case_num:literal, $expected_error:literal) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL -- thanks @scovich

variant_test_case!(1, "Unshredding not yet supported for type: List(");
variant_test_case!(2, "Unshredding not yet supported for type: List(");
variant_test_case!(1);
variant_test_case!(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I read this diff correctly that after this PR we handle all the test cases other than Decimal? If so, that is pretty rad 🤯

(btw I think the reference to https://github.com/apache/arrow-rs/issues/8329 above variant_test_case!(4); is old and can be removed )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I read this diff correctly that after this PR we handle all the test cases other than Decimal?

I believe so!

the reference to https://github.com/apache/arrow-rs/issues/8329 above variant_test_case!(4); is old and can be removed

Good catch! Fixed in #8481, since it has nothing to do with variant arrays.

@scovich
Copy link
Contributor Author

scovich commented Sep 30, 2025

I think this PR is great evidence that the pattern introduced in

Is on the right track.

Perhaps you meant

@alamb
Copy link
Contributor

alamb commented Oct 1, 2025

I think this one is now ready to rebase and merge -- I tried to do it mechanically but I hit a conflict.

@scovich since you are more familiar with this code, could you rebase the PR so I can merge it?

@scovich
Copy link
Contributor Author

scovich commented Oct 1, 2025

could you rebase the PR so I can merge it?

Done!

@alamb alamb merged commit 60ce764 into apache:main Oct 2, 2025
19 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 2, 2025

Thanks @scovich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] [Shredding] Support typed_access for List

2 participants