-
Notifications
You must be signed in to change notification settings - Fork 134
Description
We would like to request a new feature that enables seamless reading of parquet files directly from Databricks Volumes connected to Azure ADLS Gen2 as dbt sources, without creating support external tables.
We are currently working with a client on a migration to dbt and Databricks. As part of this migration, we need to read parquet files stored in a Databricks Volume (connected to Azure ADLS Gen2) directly as dbt sources.
We attempted to use the dbt-external-tables package by:
• installing the package in our dbt project running dbt deps
• specifying an external location in the source YAML files and running dbt run-operation stage_external_sources
however we encountered this error
"Database Error [RequestId=8486ae44-b999-4846-b4e6-5deb1f7ef400 ErrorClass=INVALID_PARAMETER_VALUE] Unsupported path operation PATH_CREATE_TABLE on volume"
Additionally, we observed that the dbt-external-tables package always attempts to create a supporting table for the dbt source. Even when errors occur, we can see in the logs of the databricks sql warehouse the CREATE EXTERNAL TABLE statement.
We would like to request a feature that allows the package to:
• Read parquet files directly from Databricks Volumes in a seamless manner
• Avoid creating a table for dbt sources, instead the dbt source could connect directly to the parquet files under the Volume
• Provide native support for Azure ADLS Gen2 integration through Databricks Volumes
This feature would be particularly valuable for our client organization migrating to the dbt + Databricks stack who wants to leverage existing data lake architecture without the complexity of managing additional external tables.
Would it be possible to implement this functionality? We're happy to provide additional details or collaborate on the implementation if needed.
Thank you for considering this feature request