Skip to content

Migration progress: include DFSA records in the history log #3039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 101 commits into from

Conversation

asnare
Copy link
Contributor

@asnare asnare commented Oct 22, 2024

Changes

This PR follows on from #2743 by extending the set of updates that we capture to include updated DirectFsAccess snapshots for dashboards and jobs.

Linked issues

Follows #2743.

Functionality

  • modified existing workflow: migration-progress-experimental

Tests

  • added unit tests
  • existing integration tests

This field is encoded as a Spark SQL LONG, which has a (signed) range of 64-bits.
THe history will be maintained adjacent to the crawler framework.
…ion, updated to use the Historical record type.
Copy link

github-actions bot commented Oct 22, 2024

❌ 49/50 passed, 1 failed, 2 skipped, 1h7m16s total

❌ test_running_real_migration_progress_job: AssertionError: Workflow failed: migration-progress-experimental (31m40.612s)
AssertionError: Workflow failed: migration-progress-experimental
assert False
 +  where False = validate_step('migration-progress-experimental')
 +    where validate_step = <databricks.labs.ucx.installer.workflows.DeployedWorkflows object at 0x7f7e5b6aeb60>.validate_step
 +      where <databricks.labs.ucx.installer.workflows.DeployedWorkflows object at 0x7f7e5b6aeb60> = <tests.integration.conftest.MockInstallationContext object at 0x7f7e5b61c610>.deployed_workflows
[gw1] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
08:29 INFO [tests.integration.conftest] Dashboard Created ucx_DwHym_ra78a55a0a: https://DATABRICKS_HOST/sql/dashboards/c71e0587-e22c-40e8-9dd5-fb014b2d4981
08:29 INFO [tests.integration.conftest] Dashboard Created ucx_DPzY4_ra78a55a0a: https://DATABRICKS_HOST/sql/dashboards/49c8d7f5-d2fa-4435-ac3e-1d0207534e78
08:29 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.Mq6p/config.yml) doesn't exist.
08:29 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
08:29 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
08:29 INFO [databricks.labs.ucx.install] Fetching installations...
08:29 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
08:29 DEBUG [tests.integration.conftest] Waiting for clusters to start...
08:29 DEBUG [tests.integration.conftest] Waiting for clusters to start...
08:29 INFO [databricks.labs.ucx.install] Installing UCX v0.47.1+11320241024082916
08:29 INFO [databricks.labs.ucx.install] Creating ucx schemas...
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
08:29 INFO [databricks.labs.ucx.install] Creating dashboards...
08:29 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
08:29 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
08:29 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.Mq6p/README for the next steps.
08:29 DEBUG [databricks.labs.ucx.installer.workflows] starting assessment job: https://DATABRICKS_HOST#job/420836374142604
08:29 INFO [databricks.labs.ucx.installer.workflows] Started assessment job: https://DATABRICKS_HOST#job/420836374142604/runs/491727566136780
08:29 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of assessment job: https://DATABRICKS_HOST#job/420836374142604/runs/491727566136780
08:43 INFO [databricks.labs.ucx.installer.workflows] Completed assessment job run 491727566136780 with state: RunResultState.SUCCESS
08:43 INFO [databricks.labs.ucx.installer.workflows] Completed assessment job run 491727566136780 duration: 0:13:21.825000 (2024-10-24 08:29:39.747000+00:00 thru 2024-10-24 08:43:01.572000+00:00)
08:43 DEBUG [databricks.labs.ucx.installer.workflows] Validating assessment workflow: https://DATABRICKS_HOST#job/420836374142604
08:43 INFO [databricks.labs.ucx.progress.install] Installation completed successfully!
08:43 DEBUG [databricks.labs.ucx.installer.workflows] starting migration-progress-experimental job: https://DATABRICKS_HOST#job/728082396829929
08:43 INFO [databricks.labs.ucx.installer.workflows] Started migration-progress-experimental job: https://DATABRICKS_HOST#job/728082396829929/runs/1014998267246221
08:43 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of migration-progress-experimental job: https://DATABRICKS_HOST#job/728082396829929/runs/1014998267246221
08:59 INFO [databricks.labs.ucx.installer.workflows] Completed migration-progress-experimental job run 1014998267246221 with state: RunResultState.SUCCESS_WITH_FAILURES (The job run succeeded with 11 failed tasks)
08:59 INFO [databricks.labs.ucx.installer.workflows] Completed migration-progress-experimental job run 1014998267246221 duration: 0:15:41.134000 (2024-10-24 08:43:14.549000+00:00 thru 2024-10-24 08:58:55.683000+00:00)
08:59 DEBUG [databricks.labs.ucx.installer.workflows] Validating migration-progress-experimental workflow: https://DATABRICKS_HOST#job/728082396829929
08:29 INFO [tests.integration.conftest] Dashboard Created ucx_DwHym_ra78a55a0a: https://DATABRICKS_HOST/sql/dashboards/c71e0587-e22c-40e8-9dd5-fb014b2d4981
08:29 INFO [tests.integration.conftest] Dashboard Created ucx_DPzY4_ra78a55a0a: https://DATABRICKS_HOST/sql/dashboards/49c8d7f5-d2fa-4435-ac3e-1d0207534e78
08:29 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.Mq6p/config.yml) doesn't exist.
08:29 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
08:29 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
08:29 INFO [databricks.labs.ucx.install] Fetching installations...
08:29 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
08:29 DEBUG [tests.integration.conftest] Waiting for clusters to start...
08:29 DEBUG [tests.integration.conftest] Waiting for clusters to start...
08:29 INFO [databricks.labs.ucx.install] Installing UCX v0.47.1+11320241024082916
08:29 INFO [databricks.labs.ucx.install] Creating ucx schemas...
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
08:29 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
08:29 INFO [databricks.labs.ucx.install] Creating dashboards...
08:29 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
08:29 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
08:29 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
08:29 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
08:29 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.Mq6p/README for the next steps.
08:29 DEBUG [databricks.labs.ucx.installer.workflows] starting assessment job: https://DATABRICKS_HOST#job/420836374142604
08:29 INFO [databricks.labs.ucx.installer.workflows] Started assessment job: https://DATABRICKS_HOST#job/420836374142604/runs/491727566136780
08:29 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of assessment job: https://DATABRICKS_HOST#job/420836374142604/runs/491727566136780
08:43 INFO [databricks.labs.ucx.installer.workflows] Completed assessment job run 491727566136780 with state: RunResultState.SUCCESS
08:43 INFO [databricks.labs.ucx.installer.workflows] Completed assessment job run 491727566136780 duration: 0:13:21.825000 (2024-10-24 08:29:39.747000+00:00 thru 2024-10-24 08:43:01.572000+00:00)
08:43 DEBUG [databricks.labs.ucx.installer.workflows] Validating assessment workflow: https://DATABRICKS_HOST#job/420836374142604
08:43 INFO [databricks.labs.ucx.progress.install] Installation completed successfully!
08:43 DEBUG [databricks.labs.ucx.installer.workflows] starting migration-progress-experimental job: https://DATABRICKS_HOST#job/728082396829929
08:43 INFO [databricks.labs.ucx.installer.workflows] Started migration-progress-experimental job: https://DATABRICKS_HOST#job/728082396829929/runs/1014998267246221
08:43 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of migration-progress-experimental job: https://DATABRICKS_HOST#job/728082396829929/runs/1014998267246221
08:59 INFO [databricks.labs.ucx.installer.workflows] Completed migration-progress-experimental job run 1014998267246221 with state: RunResultState.SUCCESS_WITH_FAILURES (The job run succeeded with 11 failed tasks)
08:59 INFO [databricks.labs.ucx.installer.workflows] Completed migration-progress-experimental job run 1014998267246221 duration: 0:15:41.134000 (2024-10-24 08:43:14.549000+00:00 thru 2024-10-24 08:58:55.683000+00:00)
08:59 DEBUG [databricks.labs.ucx.installer.workflows] Validating migration-progress-experimental workflow: https://DATABRICKS_HOST#job/728082396829929
08:59 INFO [databricks.labs.ucx.install] Deleting UCX v0.47.1+11320241024082916 from https://DATABRICKS_HOST
08:59 INFO [databricks.labs.ucx.install] Deleting inventory database dummy_sjsac
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=799380929741655, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=728082396829929, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=381121446150907, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=984284884208057, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=148293406372514, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=733738560341341, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=835646867814657, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=228396573518795, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=921159314539989, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=301197383412333, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=296918394603234, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=420836374142604, as it is no longer needed
08:59 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=604828094927882, as it is no longer needed
08:59 INFO [databricks.labs.ucx.install] Deleting cluster policy
08:59 INFO [databricks.labs.ucx.install] Deleting secret scope
08:59 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
[gw1] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python

Running from acceptance #7013

@asnare
Copy link
Contributor Author

asnare commented Oct 23, 2024

❌ test_running_real_migration_progress_job: AssertionError: Workflow failed: migration-progress-experimental (25m43.878s)

This is currently failing due to a bug in the crawlers that means the snapshots cannot be loaded when the Spark-based runtime is being used; fixed in #3046.

Base automatically changed from crawler-snapshot-history to main October 23, 2024 12:27
nfx pushed a commit that referenced this pull request Oct 23, 2024
## Changes

This PR fixes an issue with the DFSA and used-table crawlers that could
prevent loading of the snapshots. When loading they convert the rows to
dictionaries using `.as_dict()` which isn't available on rows provided
by the spark-based lsql backend. Instead `.asDict()` needs to be used.

Incidental changes:
- An existing integration test was updated to also test snapshot loading
for these crawlers.
 - Another test was renamed to fix a typo in the name.

### Linked issues

Relates to #3036, #3039.

### Tests

- existing unit tests
- existing integration tests
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@asnare
Copy link
Contributor Author

asnare commented Oct 24, 2024

Following a discussion, we've decided not to include DFSA records in their current form in the history table. Each DFSA record corresponds to a problem with another resource (eg. notebook, jobs). As such the intent is to aggregate these records and include them in the list of failures on the resource-specific record.

nfx pushed a commit that referenced this pull request Oct 24, 2024
…ons in addition to the normal type-based ones (#3068)

## Changes

This PR cherry-picks some changes from #3039 that updated the
`HistoryEncoder` to work correctly with databases that are declared with
`__future__.__annotations__` in effect.

When this annotation is in effect, python converts all type-hints during
import/declaration into strings and then performs deferred resolution at
a later stage. (This is why forward references work.) Unfortunately the
dataclass mechanism captures field types prior to deferred resolution.
This PR ensures that our type checking works anyway.

### Linked issues

Cherry-picks from #3039.

### Tests

- updated unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pr/do-not-merge this pull request is not ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants