This is a simple project that uses Metadata from the Delta Table to provide methods to read Delta Lake Tables
to either Polars or DuckDB with better Protocol Support as the deltalake
package (most importantly: column mapping).
uv add deltalake2db
Install deltalake2db
and duckdb
using pip/uv/poetry/whatever you use.
Then you can do like this:
from deltalake2db import get_sql_for_delta,
with duckdb.connect() as con:
sql = get_sql_for_delta("tests/data/faker2", duck_con=con) # get select statement
print(sql)
duckdb_create_view_for_delta(con, dt, "delta_table") # or let it create a view for you. will point to the data at this point in time
con.execute("select * from delta_table").fetch_all()
If you'd like to manipulate you can use get_sql_for_delta_expr
which returns a SqlGlot Object
Install deltalake2db
and polars>=1.12
using pip/uv/poetry/whatever you use.
from deltalake2db import polars_scan_delta
lazy_df = polars_scan_delta("tests/data/faker2", storage_options=None) # pass storage options as you would for other polars-related stuff
df = lazy_df.collect()
- Column Mapping
- Almost Data Types, including Structs/Lists, Map yet to be done
- Test data types, including datetime
- Deletion Vectors
In case there is an unsupported DeltaLake Feature, this will just throw DeltaProtocolError
as does delta-rs
For now, only az:// Url's for Azure are tested and supported in DuckDB. For polars it's a lot easier, since polars just uses object_store
create, so it should just work.
The package does some work to make DuckDB's "Azure Storage Options" work in Polars, to be able to use the same options.
This means you can:
- pass an absolute DuckDB-style Path to Polars, meaning something like
abfss://⟨my_storage_account⟩.dfs.core.windows.net/⟨my_filesystem⟩/⟨path⟩
- pass "chain" as option, which will act like DuckDB's Credential Chain. This requires
azure-identity
Package
We also have the following projects around deltalake:
- LakeAPI for providing deltalake Tables
- Odbc2deltalake to load MS SQL Server/ODBC Tables to Deltalake
Or projects from other people:
- polars-deltalake An experimental native polars deltalake reader
Also, the Delta Extension of DuckDB is already pretty good and should probably do whatever you need.