Skip to content

Commit 310d9ff

Browse files
authored
Added documentation to use Delta Live Tables migration (#3587)
## Changes Added documentation for the usage and detailed description for Delta Live Tables migration ### Linked issues Adds documentation for #2065 ### Functionality - [x] added relevant user documentation ### Tests - [x] manually tested
1 parent ee57731 commit 310d9ff

File tree

2 files changed

+53
-2
lines changed

2 files changed

+53
-2
lines changed

docs/ucx/docs/process/index.mdx

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,9 @@ On a high level, the steps in migration process are:
99
2. [group migration](/docs/reference/workflows#group-migration-workflow)
1010
3. [table migration](/docs/process/#table-migration-process)
1111
4. [data reconciliation](/docs/reference/workflows#post-migration-data-reconciliation-workflow)
12-
5. [code migration](#code-migration)
13-
6. [final details](#final-details)
12+
6. [code migration](/docs/reference/commands#code-migration-commands)
13+
7. [delta live table pipeline migration](/docs/process#delta-live-table-pipeline-migration-process)
14+
8. [final details](#final-details)
1415

1516
The migration process can be schematic visualized as:
1617

@@ -288,6 +289,7 @@ databricks labs ucx revert-migrated-tables --schema X --table Y [--delete-manage
288289
The [`revert-migrated-tables` command](/docs/reference/commands#revert-migrated-tables) drops the Unity Catalog table or view and reset
289290
the `upgraded_to` property on the source object. Use this command to allow for migrating a table or view again.
290291

292+
291293
## Code Migration
292294

293295
After you're done with the [table migration](#table-migration-process) and
@@ -307,6 +309,45 @@ After investigating the code linter advices, code can be migrated. We recommend
307309
- Use the [`migrate-` commands`](/docs/reference/commands#code-migration-commands) to migrate resources.
308310
- Set the [default catalog](https://docs.databricks.com/en/catalogs/default.html) to Unity Catalog.
309311

312+
313+
## Delta Live Table Pipeline Migration Process
314+
315+
> You are required to complete the [assessment workflow](/docs/reference/workflows#assessment-workflow) before starting the pipeline migration workflow.
316+
317+
The pipeline migration process is a workflow that clones the Hive Metastore Delta Live Table (DLT) pipelines to the Unity Catalog.
318+
Upon the first update, the cloned pipeline will copy over all the data and checkpoints, and then run normally thereafter. After the cloned pipeline reaches ‘RUNNING’, both the original and the cloned pipeline can run independently.
319+
320+
#### Example:
321+
Existing HMS DLT pipeline is called "dlt_pipeline", the pipeline will be stopped and renamed to "dlt_pipeline [OLD]". The new cloned pipeline will be "dlt_pipeline".
322+
323+
### Known issues and Limitations:
324+
- Only clones from HMS to UC are supported.
325+
- Pipelines may only be cloned within the same workspace.
326+
- HMS pipelines must currently be publishing tables to some target schema.
327+
- Only the following streaming sources are supported:
328+
- Delta
329+
- [Autoloader](https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/index.html)
330+
- If your pipeline uses Autoloader with file notification events, do not run the original HMS pipeline after cloning as this will cause some file notification events to be dropped from the UC clone. If the HMS original was started accidentally, missed files can be backfilled by using the `cloudFiles.backfillInterval` option in Autoloader.
331+
- Kafka where `kafka.group.id` is not set
332+
- Kinesis where `consumerMode` is not "efo"
333+
- [Maintenance](https://docs.databricks.com/en/delta-live-tables/index.html#maintenance-tasks-performed-by-delta-live-tables) is automatically paused (for both pipelines) while migration is in progress
334+
- If an Autoloader source specifies an explicit `cloudFiles.schemaLocation`, `mergeSchema` needs to be set to true for the HMS original and UC clone to operate concurrently.
335+
- Pipelines that publish tables to custom schemas are not supported.
336+
- On tables cloned to UC, time travel queries are undefined when querying by timestamp to versions originally written on HMS. Time travel queries by version will work correctly, as will time travel queries by timestamp to versions written on UC.
337+
- [All existing limitations](https://docs.databricks.com/en/delta-live-tables/unity-catalog.html#limitations) of using DLT on UC.
338+
- [Existing UC limitations](https://docs.databricks.com/en/data-governance/unity-catalog/index.html#limitations)
339+
- If tables in the HMS pipeline specify storage locations (using the "path" parameter in Python or the LOCATION clause in SQL), the configuration "pipelines.migration.ignoreExplicitPath" can be set to "true" to ignore the parameter in the cloned pipeline.
340+
341+
342+
### Considerations
343+
- Do not edit the notebooks that define the pipeline during cloning.
344+
- The original pipeline should not be running when requesting the clone.
345+
- When a clone is requested, DLT will automatically start an update to migrate the existing data and metadata for Streaming Tables, allowing them to pick up where the original pipeline left off.
346+
- It is expected that the update metrics do not include the migrated data.
347+
- Make sure all name-based references in the HMS pipeline are fully qualified, e.g. hive_metastore.schema.table
348+
- After the UC clone reaches RUNNING, both the original pipeline and the cloned pipeline may run independently.
349+
350+
310351
## Final details
311352

312353
Once you're done with the [code migration](#code-migration), you can run the:

docs/ucx/docs/reference/commands/index.mdx

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -660,6 +660,16 @@ It takes a `WorkspaceClient` object and `from` and `to` parameters as parameters
660660
the `TableMove` class. This command is useful for developers and administrators who want to create an alias for a table.
661661
It can also be used to debug issues related to table aliasing.
662662

663+
## Pipeline migration commands
664+
665+
These commands are for [pipeline migration](/docs/process#delta-live-table-pipeline-migration-process) and require the [assessment workflow](/docs/reference/workflows#assessment-workflow) to be completed.
666+
667+
### `migrate-dlt-pipelines`
668+
669+
```text
670+
$ databricks labs ucx migrate-dlt-pipelines [--include-pipeline-ids <comma separated list of pipeline ids>] [--exclude-pipeline-ids <comma separated list of pipeline ids>]
671+
```
672+
663673
## Utility commands
664674

665675
### `logs`

0 commit comments

Comments
 (0)