Skip to content

[Variant] Panic when appending nested objects to VariantBuilder #7907

Open
@alamb

Description

@alamb

Describe the bug
While writing tests for another feature, I hit a panic appending nested objects th the Variant Builder

To Reproduce
Here are some tests that run in parquet-variant/src/builder.rs:

/// Test appending nested structures
    #[test]
    fn test_append_list() {
        let (m1, v1) = make_list();
        let variant = Variant::new(&m1, &v1);
        let mut builder = VariantBuilder::new();
        builder.append_value(variant.clone());
        let (metadata, value) = builder.finish();
        assert_eq!(variant, Variant::new(&metadata, &value));
    }

    /// make a simple List variant
    fn make_list() -> (Vec<u8>, Vec<u8>) {
        let mut builder = VariantBuilder::new();
        let mut list = builder.new_list();
        list.append_value(1234);
        list.append_value("a string value");
        list.finish();
        builder.finish()
    }

Teset for object

    #[test]
    fn test_append_object() {
        let (m1, v1) = make_object();
        let variant = Variant::new(&m1, &v1);
        let mut builder = VariantBuilder::new();
        builder.append_value(variant.clone());
        let (metadata, value) = builder.finish();
        assert_eq!(variant, Variant::new(&metadata, &value));
    }

    /// make an object variant
    fn make_object() -> (Vec<u8>, Vec<u8>) {
        let mut builder = VariantBuilder::new();
        let mut obj = builder.new_object();
        obj.insert("a", true);
        obj.finish().unwrap();
        builder.finish()
    }

Both tests panic

internal error: entered unreachable code: Nested values are handled specially by ObjectBuilder and ListBuilder
thread 'builder::tests::test_append_object' panicked at parquet-variant/src/builder.rs:229:17:
internal error: entered unreachable code: Nested values are handled specially by ObjectBuilder and ListBuilder
stack backtrace:

Expected behavior
I expect the tests to pass

Additional context
I don't think the builder can simply copy the underlying bytes directly into its output buffer in the general case because field ids of any embedded objects need to be updated to the in progress metadata

So the first implementation probably needs to walk over the nested structures and append to the builder

Some future optimization could be to recognize when there are no embedded objects and in that case just copy the bytes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions