Add DeltaLake source support #31

danielbeach · 2025-03-07T19:53:13Z

Add a new method called read_deltalake(). Simply read the Delta Lake table and get the list of parquet files for that Delta Lake table so we can reuse ParquetDataSet. (doing this by calling dt.files() which returns a list of parquet files).

wangrunji0408 · 2025-03-17T06:46:14Z

As far as I know, Delta Lake is not simply a list of parquet files, right? I think we need a dedicated DeltaLakeDataset to handle them properly.

danielbeach · 2025-03-17T16:58:18Z

As far as I know, Delta Lake is not simply a list of parquet files, right? I think we need a dedicated DeltaLakeDataset to handle them properly.

Yes, but we are using the deltalake package, and specifically an instance of a DeltaTable to give a list of the parquet files that currently make up the current version of the Delta Lake table via files() method they provide. I mean in theory we could add another dependecy of polars or daft or something else to return the Delta Lake as a Dataframe and the dump it to an arrow table, but that seems more indirect.

danielbeach added 4 commits March 7, 2025 13:43

add deltalake source support

b2a53d8

remove unnessesary format

01bb896

update test

6df066d

update tests

27bfc22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add DeltaLake source support #31

Add DeltaLake source support #31

danielbeach commented Mar 7, 2025

Uh oh!

wangrunji0408 commented Mar 17, 2025

Uh oh!

danielbeach commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add DeltaLake source support #31

Are you sure you want to change the base?

Add DeltaLake source support #31

Conversation

danielbeach commented Mar 7, 2025

Uh oh!

wangrunji0408 commented Mar 17, 2025

Uh oh!

danielbeach commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants