-
Couldn't load subscription status.
- Fork 123
Description
Whether a secondary catalog is attached or not, dbt run and dbt compile must execute the following SQL query to list tables in the local DuckDB database schema. However, if there is an Iceberg catalog attached, then this query can take minutes long.
select
'<duckdb_database>' as database,
table_name as name,
table_schema as schema,
CASE table_type
WHEN 'BASE TABLE' THEN 'table'
WHEN 'VIEW' THEN 'view'
WHEN 'LOCAL TEMPORARY' THEN 'table'
END as type
from system.information_schema.tables
where lower(table_schema) = '<duckdb_schema>'
and lower(table_catalog) = '<duckdb_database>'Below is an excerpt from the dbt log showing this query taking over 90 seconds to complete. We have a modestly populated Glue catalog in our AWS account where we develop pipelines with dbt. All our dbt-DuckDB users report this problem consistently (if their project attaches to the Glue catalog). This makes it very difficult to develop their dbt project because every change to a model is usually followed by a quick test run to check the output. Users are opting to completely forego this integration with Iceberg catalogs because they can't develop in rapid iterations.
18:50:41 Opening a new connection, currently in state closed
18:50:41 SQL status: OK in 0.001 seconds
18:50:41 Using duckdb connection "list_vocab_ingest_v1_4_0_main"
18:50:41 On list_vocab_ingest_v1_4_0_main: /* {"app": "dbt", "dbt_version": "1.10.11", "profile_name": "dbt_vocab_ingest", "target_name": "dev", "connection_name": "list_vocab_ingest_v1_4_0_main"} */
select
'vocab_ingest_v1_4_0' as database,
table_name as name,
table_schema as schema,
CASE table_type
WHEN 'BASE TABLE' THEN 'table'
WHEN 'VIEW' THEN 'view'
WHEN 'LOCAL TEMPORARY' THEN 'table'
END as type
from system.information_schema.tables
where lower(table_schema) = 'main'
and lower(table_catalog) = 'vocab_ingest_v1_4_0'
18:52:13 SQL status: OK in 92.597 seconds
18:52:13 On list_vocab_ingest_v1_4_0_main: ROLLBACK
18:52:13 Failed to rollback 'list_vocab_ingest_v1_4_0_main'
18:52:13 On list_vocab_ingest_v1_4_0_main: Close
18:52:13 Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'dc2ca9af-81db-49ba-9597-80da292a9ab6', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x770194c9ffe0>]}
18:52:13 Using duckdb connection "master"
18:52:13 On master: BEGIN
18:52:13 Opening a new connection, currently in state init
18:52:13 SQL status: OK in 0.001 seconds
Just to give an idea of what would be returned by query above if it were unfiltered, here are counts of tables and schemas by catalog and table type:
SELECT
table_catalog,
table_type,
COUNT(DISTINCT table_schema) AS "schema_count",
COUNT(DISTINCT table_name) AS "table_count",
FROM system.information_schema.tables
GROUP BY 1,2
ORDER BY 1,2
;| table_catalog | table_type | schema_count | table_count |
|---|---|---|---|
| glue_catalog | BASE TABLE | 113 | 14198 |
| vocab_ingest_v1_4_0 | BASE TABLE | 2 | 4 |
In the test above, the local DuckDB database name is vocab_ingest_v1_4_0, so the filtered query returns only 4 tables, even though the query takes a minute and a half.
The dbt configurations used to attach the Iceberg catalog can be seen in Issue #639.