Skip to content

Commit dbcff60

Browse files
phillipleblancy-f-upeaseeSevenannnsgrebnov
authored
Merge spiceai changes from the past 6 months (#255)
* DuckDB streaming (#41) * wip * duckdb streaming * clippy * arrow to arrow stream * error message * fix: Support `INTERVAL` in SQLite (#85) * poc: Support interval in SQLite using an AST analyzer * Refactoring * u64 -> i64 * fix: Support INTERVAL expressions in SQLite * docs: Add comment about flattening arguments list * refactor: Rename SQLiteVisitor to SQLiteIntervalVisitor * test: Add some tests --------- Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> * Use DuckDB streaming * Fixes * Fix feature flagging * Fix lint * Add spiceai branch to pull_request --------- Co-authored-by: peasee <98815791+peasee@users.noreply.github.com> Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> * Add feature flag to disable postgres federation * fix: Disable federation in memory mode databases (#86) * Only disable federation for tableproviderfactory * SQLite: Validate expected indexes when attaching local datasets (#88) * SQLite: Validate expected indexes when attaching local datasets * Add test for indexes creation and retrieval (SQLite) * Update warning messages * SQLite: Validate expected primary keys when attaching local datasets (#89) * Change to use Spice AI fork of sea_query for SQLite decimal support (#90) * fix: Don't silence blocking task errors (#91) * fix: Don't silence blocking task errors * fix: Cover Ok(Err()) match arm for DuckDB writer handle * refactor: Rename overloaded error e * fix: Re-attach databases on each DuckDB query (#92) * fix: Re-attach databases on each query * Update src/sql/db_connection_pool/dbconnection/duckdbconn.rs --------- Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> * Correctly handle mysql timestamp() and datetime() types (#96) * Correctly handle mysql timestamp() and datetime() types * Restructure MySQL test, add test for timestamp() types * Include test for datetime types * Postgres enum support (#100) * Postgres enum support * Add enum test as part of integration test * update * Remove the duplicate function * fix --------- Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> * Fix SQLite Invalid column type Real bug (#98) * Prevent SQLite from writing incomplete data on errors (#101) * DuckDBTableProviderFactory keeps track of opened instances (#105) * Ignore CHECKPOINT errors (#107) * Don't attempt to CHECKPOINT after writing to DuckDB (#108) * SqliteTableProviderFactory keeps track of opened instances (#109) * wip * wip * wip * tweak * Support all time() types in MySQL (#97) * Support all time() types in MySQL * Include test for time types * Upgrade to Arrow 53, DataFusion 42 and DuckDB 1.1 (#111) * Handle inconsistent scale in Postgres Numeric Type data (#110) * Verify MySQL parameters and connections before creating connection pool (#113) * Verify MySQL parameters and connections before creating connection pool * Update * Propagate MySQL wrong table error (#114) * Fix MySQL timestamp type (#116) * Postgres should respect target decimal precision and scale (#120) * Update row -> arrow conversion for all MYSQL_TYPE_VAR_STRING and MYSQL_TYPE_STRING types (#118) * Use Decimal256 instead of Decimal128 for MySQL decimal type (#115) * Fix mysql blob & text types (#117) * Add sqlite_busytimeout parameter as user configurable param (#121) * Add sqlite_busytimeout parameter as user configurable param * Remove debug log * Fix lint, fix integration test --------- Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> * Use arrow dictionary type for mysql enum type (#119) * Remove prefix for sqlite busy timeout param (#123) * Support parsing sqlite busy_timeout durations with units (#124) * Support retries when writing data to SQLite (#125) * Implement write retry for DuckDB (#128) * Preserve records batch order (update datafusion-federation) (#130) * fix: add ast analyzer for mysql rank (#131) * fix: Add AST analyzer for rewriting rank() in MySQL * test: Add new test, fix other SQLite tests * docs: Clarify what frame clauses are ignored in * chore: Clippy * fix: Remove NULLS FIRST/LAST in more MySQL Window functions (#133) * Update datafusion-federation crate to include `unnest` support (#136) * Update datafusion-federation to the latest (#137) * Update datafusion-federation (improve filters pushdown) (#149) * tests: Ensure all tests run on PRs, try to fix flaky DuckDB test (#150) (#152) * Implement native schema inference for PostgreSQL (#151) * wip * test cleanup * Implement native schema inference for PostgreSQL * handle bpchar * fix uuid * Fix test * align * fix snapshot * Fix DuckDB error messages (#154) * SQLite: use projected schema when converting records (#158) * feat: Enable in-memory federation (#159) * feat: Enable federation for in-memory tables * test: Update SQLite test * test: Update tests * Always read TimezoneTZ from PostgreSQL as UTC (#161) * fix: Prevent absolute sequences in file paths (#160) * fix: Prevent absolute sequences in file paths * refactor: Make checks more robust * refactor: Check for symlinks, make errors more robust * test: Update test * fix: Use path::absolute instead of canonicalize * Remove restriction on file being in working directory (#163) * Include unnecessary columns pruning step during federated plan creation (#162) * feat: add duckdb checks for unsupported column types (#164) * feat: Add DuckDB checks for unsupported column types * deps: Update Cargo.toml * fix: Make serde non-optional, let InvalidTypeAction be Copy * fix: More features shenanigans * fix: DuckDB boolean list support (#169) * Upgrade to DataFusion 43 (#167) * Upgrade to DataFusion 43 * Support Utf8View & BinaryView * Support nested utf8view & binaryview * Use DuckDB Dialect and update Datafusion patch (#170) * Free up disk space for integration test (#171) * Drop postgres containers once test is finished * Add step to free disk space in integration test job * update * Fix `MySQLConnection::get_schema` for uppercase `TableReference` (#166) * fix MySQLConnection::get_schema for uppercase TableReference * Update mysqlconn.rs * PR reviews * fix clppy * flatten * Update mysqlconn.rs --------- Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> * Fix MySQL timestamp conversion when running via `MySQLSQLExec` (#173) * Set MySQL session default time zone to UTC to match Datafusion (#174) * Fix MySQL test (#181) * Increase the waiting time for starting MySQL test container * Update health check command * Improve MySQL errors (#180) * Further improve MySQL error to be concise and specific (#184) * Further improve MySQL error to be concise and specifc * Update src/sql/db_connection_pool/mysqlpool.rs Co-authored-by: Scott Lyons <scottalyons@gmail.com> --------- Co-authored-by: Scott Lyons <scottalyons@gmail.com> * Update DuckDB Error messages (#182) * Improve Postgres errors (#183) * Separate MySQL error source to separate line (#185) * refactor: Update dbconnection errors (#188) * Handle invalid data types for Postgres (#191) * Handle invalid data types for Postgres * Fix lint issues * Fix tests * Fix integration test * Fix writing to a Postgres table with a schema (#195) * Fix insert statement when all columns are constraint columns (#196) * Revert "Fix writing to a Postgres table with a schema (#195)" This reverts commit afd31a7. * Support Postgres table with a schema write (#197) * Allow overriding the default DuckDB dialect (#201) * Fix DuckDBDialect creation according to DataFusion DuckDBDialect update (#202) * DuckDB: support for nested types in Lists (Struct, List, FixedSizeList) (#203) * Fix datafusion federation (#200) * Update datafusion-federation to fix unnest support (#206) * Use random id (#205) * Fix column name rewrite when column alias has same name as table (#207) * fix: Don't silence disk full errors with SQLite (#208) * fix: Don't silence disk full in SQLite * chore: Remove commented out line * fix: Validate schema for SQLite connections (#209) * fix: SQLite validate schema to stop panicking * fix: SQLite does not support dictionary * chore: Fix clippy * fix: SQLite does not support Map * chore: Clippy * fix: Optional dependencies * chore: Clippy * chore: More clippy * refactor: Move SchemaValidator implementations into DB modules * Fix dremio subquery unparsing (#210) * fix: Don't panic on unsupported data insert with Postgres (#211) * fix: update datafusion-federation with the fix to preserver OFFSET and LIMIT in logical plan (#212) * fix: update datafusion-federation with the fix to preserver OFFSET and LIMIT in logical plan * fix: update sql dependency * fix: revert formatting * Update `datafusion-federation` to support multi-level table references (#213) * Update datafusion-federation * Update federation * Postgres: add schema validation for record batches during write (#215) * Fix TableScan filter rewrite & column expressions rewrite (#214) * Fix TableScan filter rewrite * update datafusion federation patch * Federation fix for outer ref columns (#217) * Fix table_reference (#218) * Revert "Fix table_reference (#218)" This reverts commit 3d5336e. * Revert "Federation fix for outer ref columns (#217)" This reverts commit 7dc094e. * Update federation to fix correlated subquery bug * fix: Use Unparser for expr to sql (#226) * fix: Use Unparser for expr to sql * chore: Remove println * fix: Always cast to BIGINT * Fix MySQL docker image for PR tests (#225) * chore: Clippy * fix: Install SQLite3 --------- Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> * fix: Revert using unparser for filter pushdown (#227) * Update federation commit: Add support for more AST expressions for multi-level rewrites * fix: Postgres LargeUtf8 is equivalent to Utf8 (#231) * fix: Postgres LargeUtf8 is equivalent to Utf8 * fix: Normalize both schemas * fix: Optimize postgres schema loop * fix: Field Clone/Copy shenanigans * fix: Test * fix: Test * MySQL: include column name when failed to get a row value (#232) * MySQL: treat MySQL special “zero” date '0000-00-00' as NULL (#233) * Rename `InvalidTypeAction` to `UnsupportedTypeAction` (#234) * Handle JSONB as UnsupportedTypeAction::String (#235) * Fix constraint verification for columns with uppercase letters (#237) * Upgrade to DuckDB v1.2.0 (#239) * DuckDB: Use temp table only for append with defined resolution strategy (#242) * Fix arrow-rs and chrono's quarter() conflict (#244) * Revert "DuckDB: Use temp table only for append with defined resolution strate…" (#243) This reverts commit 39be511. * DuckDB: fix error handling during record batch insertion (#245) * Bump secrecy version (#248) * Add memory_limit support for DuckDB (#251) * Revert "Bump secrecy version (#248)" (#252) This reverts commit a1f7173. * cargo lock * fix merge * compiling * Fix lint and test issues * Fix unit tests * Fix integration tests --------- Co-authored-by: yfu <fevin86@gmail.com> Co-authored-by: peasee <98815791+peasee@users.noreply.github.com> Co-authored-by: Sevenannn <qianqliu@uw.edu> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Qianqian <130200611+Sevenannn@users.noreply.github.com> Co-authored-by: Jack Eadie <jack.eadie0@gmail.com> Co-authored-by: Scott Lyons <scottalyons@gmail.com> Co-authored-by: Evgenii Khramkov <hey@ewgenius.me>
2 parents 238e655 + 7bbdb22 commit dbcff60

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+2755
-965
lines changed

Cargo.lock

Lines changed: 27 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,23 +8,33 @@ license = "Apache-2.0"
88
description = "Extend the capabilities of DataFusion to support additional data sources via implementations of the `TableProvider` trait."
99

1010
[dependencies]
11+
arrow = "54.2.1"
12+
arrow-array = { version = "54.2.1", optional = true }
1113
arrow-flight = { version = "54.2.1", optional = true, features = [
1214
"flight-sql-experimental",
1315
"tls",
1416
] }
17+
arrow-schema = { version = "54.2.1", optional = true, features = ["serde"] }
18+
arrow-json = "54.2.1"
1519
arrow-odbc = { version = "=15.1.1", optional = true }
1620
async-stream = { version = "0.3", optional = true }
1721
async-trait = "0.1"
22+
base64 = { version = "0.22.1", optional = true }
1823
bb8 = { version = "0.9", optional = true }
1924
bb8-postgres = { version = "0.9", optional = true }
2025
bigdecimal = "0.4"
2126
byteorder = "1.5.0"
27+
bytes = { version = "1.7.1", optional = true }
28+
byte-unit = { version = "5.1.4", optional = true }
2229
chrono = "0.4"
2330
dashmap = "6.1.0"
2431
datafusion = { version = "45", default-features = false }
32+
datafusion-expr = { version = "45", optional = true }
2533
datafusion-federation = { version = "=0.3.6", features = [
2634
"sql",
2735
], optional = true }
36+
datafusion-physical-expr = { version = "45", optional = true }
37+
datafusion-physical-plan = { version = "45", optional = true }
2838
datafusion-proto = { version = "45", optional = true }
2939
duckdb = { version = "=1.2.0", features = [
3040
"bundled",
@@ -52,6 +62,7 @@ odbc-api = { version = "11.1", optional = true }
5262
pem = { version = "3.0.4", optional = true }
5363
postgres-native-tls = { version = "0.5.0", optional = true }
5464
prost = { version = "0.13", optional = true }
65+
rand = { version = "0.9" }
5566
r2d2 = { version = "0.8", optional = true }
5667
rusqlite = { version = "0.32", optional = true }
5768
sea-query = { version = "0.32", features = [
@@ -64,7 +75,7 @@ sea-query = { version = "0.32", features = [
6475
"with-chrono",
6576
] }
6677
secrecy = "0.8.0"
67-
serde = { version = "1.0", optional = true }
78+
serde = { version = "1.0", features = ["derive"] }
6879
serde_json = "1.0"
6980
sha2 = "0.10"
7081
snafu = "0.8"
@@ -106,14 +117,16 @@ duckdb = [
106117
"dep:uuid",
107118
"dep:dyn-clone",
108119
"dep:async-stream",
120+
"dep:arrow-schema",
121+
"dep:byte-unit",
109122
]
110123
duckdb-federation = ["duckdb", "federation"]
111124
federation = ["dep:datafusion-federation"]
112125
flight = [
113126
"dep:arrow-flight",
114127
"datafusion/serde",
115128
"dep:datafusion-proto",
116-
"dep:serde",
129+
"dep:prost",
117130
"dep:tonic",
118131
]
119132
mysql = ["dep:mysql_async", "dep:async-stream"]
@@ -129,13 +142,24 @@ postgres = [
129142
"dep:native-tls",
130143
"dep:pem",
131144
"dep:async-stream",
145+
"dep:arrow-schema",
132146
]
133147
postgres-federation = ["postgres", "federation"]
134-
sqlite = ["dep:rusqlite", "dep:tokio-rusqlite"]
148+
sqlite = ["dep:rusqlite", "dep:tokio-rusqlite", "dep:arrow-schema"]
135149
sqlite-federation = ["sqlite", "federation"]
136150
sqlite-bundled = ["sqlite", "rusqlite/bundled"]
137151

138152
[[example]]
139153
name = "odbc_sqlite"
140154
path = "examples/odbc_sqlite.rs"
141-
required-features = ["sqlite", "odbc"]
155+
required-features = ["sqlite", "odbc"]
156+
157+
[[example]]
158+
name = "flight-sql"
159+
path = "examples/flight-sql.rs"
160+
required-features = ["flight"]
161+
162+
[[example]]
163+
name = "sqlite"
164+
path = "examples/sqlite.rs"
165+
required-features = ["sqlite"]

examples/sqlite.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1+
use std::{sync::Arc, time::Duration};
2+
13
use datafusion::{prelude::SessionContext, sql::TableReference};
24
use datafusion_table_providers::{
35
common::DatabaseCatalogProvider,
46
sql::db_connection_pool::{sqlitepool::SqliteConnectionPoolFactory, Mode},
57
sqlite::SqliteTableFactory,
68
};
7-
use std::sync::Arc;
8-
use std::time::Duration;
99

1010
/// This example demonstrates how to:
1111
/// 1. Create a SQLite connection pool
@@ -22,11 +22,11 @@ async fn main() {
2222
SqliteConnectionPoolFactory::new(
2323
"examples/sqlite_example.db",
2424
Mode::File,
25-
Duration::default(),
25+
Duration::from_millis(5000),
2626
)
2727
.build()
2828
.await
29-
.expect("unable to create Sqlite connection pool"),
29+
.expect("failed to create sqlite connection pool"),
3030
);
3131

3232
// Create SQLite table provider factory

src/common.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,6 @@ impl<T: 'static, P: 'static> SchemaProvider for DatabaseSchemaProvider<T, P> {
8888
&self.name,
8989
&self.pool,
9090
TableReference::partial(self.name.clone(), table.to_string()),
91-
None,
9291
)
9392
.await
9493
.map(|v| Some(Arc::new(v) as Arc<dyn TableProvider>))

0 commit comments

Comments
 (0)