Skip to content

Implement physical plan serialization for COPY plans CsvLogicalExtensionCodec #11150

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

As part of #11060, @devinjdangelo made file format support into a Trait which is good!

However the code to serialize these new (dynamic) structures is not yet implemented

As @devinjdangelo says https://github.com/apache/datafusion/pull/11060/files#r1650268578

Users depending on the ability to serialize COPY plans (e.g. ballista) will need this TODO to be completed before upgrading to any version of datafusion including this change.

It turns out there are no unit tests for them either so no tests failed

Describe the solution you'd like

Implement the named codec for serializing plans and a test for it

Describe alternatives you've considered

The code is here: datafusion/proto/src/logical_plan/file_formats.rs
The test would go here: https://github.com/apache/datafusion/blob/main/datafusion/proto/tests/cases/roundtrip_physical_plan.rs

Note there is already coverage for LogicalPlans here:

async fn roundtrip_logical_plan_copy_to_sql_options() -> Result<()> {
let ctx = SessionContext::new();
let input = create_csv_scan(&ctx).await?;
let mut table_options = ctx.copied_table_options();
table_options.set_file_format(FileType::CSV);
table_options.set("format.delimiter", ";")?;
let plan = LogicalPlan::Copy(CopyTo {
input: Arc::new(input),
output_url: "test.csv".to_string(),
partition_by: vec!["a".to_string(), "b".to_string(), "c".to_string()],
format_options: FormatOptions::CSV(table_options.csv.clone()),
options: Default::default(),
});
let bytes = logical_plan_to_bytes(&plan)?;
let logical_round_trip = logical_plan_from_bytes(&bytes, &ctx)?;
assert_eq!(format!("{plan:?}"), format!("{logical_round_trip:?}"));
Ok(())
}

Additional context

There are several other codecs needed:

  • JsonLogicalExtensionCodec
  • ParquetLogicalExtensionCodec
  • ArrowLogicalExtensionCodec
  • AvroLogicalExtensionCodec

However, I think we need to get one example done and then we can file tickets to fill out the others

Maybe this is what @lewiszlw was getting at with #11095 / https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/composed_extension_codec.rs

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions