You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix issue with migrating MANAGED hive_metastore table to UC (#2892)
- HMS MANAGED tables when deleted also delete their underlying data.
- If an HMS-managed table is migrated to UC as EXTERNAL, dropping the
HMS table will delete the underlying data file and render the UC table
unusable, leading to a non-recoverable data loss.
- Changing the MANAGED table to EXTERNAL may have consequences on
regulatory data cleanup, as deleting the EXTERNAL table no longer
deletes the underlying table. It would cause leakage of data when tables
are dropped.
- As with the case of duplicating the data, if new data is added to
either HMS or UC, the other table goes out of sync requiring
re-migration
Tradeoffs are described in this table:
<img width="639" alt="image"
src="https://github.com/user-attachments/assets/04456618-aee4-4859-88b9-c612a9629429">
Resolves#2838
| EXTERNAL_SYNC | Tables not saved to the DBFS file system that are supported by the sync command.<br/> These tables are in one of the following formats: DELTA, PARQUET, CSV, JSON, ORC, TEXT, AVRO | During the upgrade process, these table contents will remain intact and the metadata will be recreated in UC using the sync SQL command.<br/>More information about the sync command can be found [here](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-sync.html)|
594
-
| EXTERNAL_HIVESERDE | Tables with table type "HIVE" that are not supported by the sync command | We provide two workflows for hiveserde table migration:<br/>1. Migrate all hiveserde tables using CTAS which we officially support.<br/>2. Migrate certain types of hiveserde in place, which is technically working, but the user need to accept the risk that the old files created by hiveserde may not be processed correctly by Spark datasource in corner cases. User will need to decide which workflow to runs first which will migrate the hiveserde tables and mark the `upgraded_to` property and hence those tables will be skipped in the migration workflow runs later. |
595
-
| EXTERNAL_NO_SYNC | Tables not saved to the DBFS file system that are not supported by the sync command | The current upgrade process will migrate these tables to UC by creating a new managed table in UC and copying the data from the old table to the new table. The new table's format will be Delta. |
596
-
| DBFS_ROOT_DELTA | Tables saved to the DBFS file system that are in Delta format | The current upgrade process will create a copy of these tables in UC using the "deep clone" command.<br/>More information about the deep clone command can be found [here](https://docs.databricks.com/en/sql/language-manual/delta-clone.html)|
597
-
| DBFS_ROOT_NON_DELTA | Tables saved to the DBFS file system that are not in Delta format | The current upgrade process will create a managed table using CTAS | |
598
-
| VIEW | Datbase Views | Views are recreated during the upgrade process. The view's definition will be modified to repoint to the new UC tables. Views should be migrated only after all the dependent tables have been migrated. The upgrade process account for View to View dependencies. |
| EXTERNAL_SYNC | Tables not saved to the DBFS file system that are supported by the sync command.<br/> These tables are in one of the following formats: DELTA, PARQUET, CSV, JSON, ORC, TEXT, AVRO | During the upgrade process, these table contents will remain intact and the metadata will be recreated in UC using the sync SQL command.<br/>More information about the sync command can be found [here](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-sync.html)|
594
+
| EXTERNAL_HIVESERDE | Tables with table type "HIVE" that are not supported by the sync command | We provide two workflows for hiveserde table migration:<br/>1. Migrate all hiveserde tables using CTAS which we officially support.<br/>2. Migrate certain types of hiveserde in place, which is technically working, but the user need to accept the risk that the old files created by hiveserde may not be processed correctly by Spark datasource in corner cases. User will need to decide which workflow to runs first which will migrate the hiveserde tables and mark the `upgraded_to` property and hence those tables will be skipped in the migration workflow runs later. |
595
+
| EXTERNAL_NO_SYNC | Tables not saved to the DBFS file system that are not supported by the sync command | The current upgrade process will migrate these tables to UC by creating a new managed table in UC and copying the data from the old table to the new table. The new table's format will be Delta. |
596
+
| DBFS_ROOT_DELTA | Tables saved to the DBFS file system that are in Delta format | The current upgrade process will create a copy of these tables in UC using the "deep clone" command.<br/>More information about the deep clone command can be found [here](https://docs.databricks.com/en/sql/language-manual/delta-clone.html)|
597
+
| DBFS_ROOT_NON_DELTA | Tables saved to the DBFS file system that are not in Delta format | The current upgrade process will create a managed table using CTAS | |
598
+
| VIEW | Datbase Views | Views are recreated during the upgrade process. The view's definition will be modified to repoint to the new UC tables. Views should be migrated only after all the dependent tables have been migrated. The upgrade process account for View to View dependencies. |
599
+
| MANAGED | Tables that are created as managed table in hive_metastore. | Depending on the WorkspaceConfig property managed_table_external_storage: 1. If the property is set to default CLONE (selected during installation). The UC Table will be created as CTAS which will created a copy of the data in UC. 2 If the property is set to SYNC_AS_EXTERNAL, the UC Table will be created as a EXTERNAL table. There is a risk, if the managed HMS table is dropped, which will drop the data and it will affect the UC table as well. |
600
600
The upgrade process can be triggered using the `migrate-tables`[UCX command](#migrate-tables-command)
601
601
602
602
Or by running the table migration workflows deployed to the workspace.
0 commit comments