Skip to content

Commit 7b237a3

Browse files
authored
Remove WorkflowLinter as it is part of the Assessment workflow (#3036)
## Changes Remove `WorkflowLinter` as it is part of the `Assessment` workflow ### Linked issues Resolves #3035 ### Functionality - [x] removed workflow: `experimental-workflow-linter` ### Tests - [x] manually tested
1 parent 2b4865e commit 7b237a3

File tree

6 files changed

+38
-93
lines changed

6 files changed

+38
-93
lines changed

README.md

Lines changed: 3 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -401,9 +401,8 @@ which can be used for further analysis and decision-making through the [assessme
401401
9. `assess_pipelines`: This task scans through all the Pipelines and identifies those pipelines that have Azure Service Principals embedded in their configurations. A list of all the pipelines with matching configurations is stored in the `$inventory.pipelines` table.
402402
10. `assess_azure_service_principals`: This task scans through all the clusters configurations, cluster policies, job cluster configurations, Pipeline configurations, and Warehouse configuration and identifies all the Azure Service Principals who have been given access to the Azure storage accounts via spark configurations referred in those entities. The list of all the Azure Service Principals referred in those configurations is saved in the `$inventory.azure_service_principals` table.
403403
11. `assess_global_init_scripts`: This task scans through all the global init scripts and identifies if there is an Azure Service Principal who has been given access to the Azure storage accounts via spark configurations referred in those scripts.
404-
12. `assess_dashboards`: This task scans through all the dashboards and analyzes embedded queries for migration problems. It also collects direct filesystem access patterns that require attention.
405-
13. `assess_workflows`: This task scans through all the jobs and tasks and analyzes notebooks and files for migration problems. It also collects direct filesystem access patterns that require attention.
406-
404+
12. `assess_dashboards`: This task scans through all the dashboards and analyzes embedded queries for migration problems which it persists in `$inventory_database.query_problems`. It also collects direct filesystem access patterns that require attention which it persists in `$inventory_database.directfs_in_queries`.
405+
13. `assess_workflows`: This task scans through all the jobs and tasks and analyzes notebooks and files for migration problems which it persists in `$inventory_database.workflow_problems`. It also collects direct filesystem access patterns that require attention which it persists in `$inventory_database.directfs_in_paths`.
407406

408407
![report](docs/assessment-report.png)
409408

@@ -726,27 +725,10 @@ in the Migration dashboard.
726725

727726
[[back to top](#databricks-labs-ucx)]
728727

729-
## Jobs Static Code Analysis Workflow
730-
731-
> Please note that this is an experimental workflow.
732-
733-
The `experimental-workflow-linter` workflow lints accessible code from 2 sources:
734-
- all workflows/jobs present in the workspace
735-
- all dashboards/queries present in the workspace
736-
The linting emits problems indicating what to resolve for making the code Unity Catalog compatible.
737-
The linting also locates direct filesystem access that need to be migrated.
738-
739-
Once the workflow completes:
740-
- problems are stored in the `$inventory_database.workflow_problems`/`$inventory_database.query_problems` table
741-
- direct filesystem access are stored in the `$inventory_database.directfs_in_paths`/`$inventory_database.directfs_in_queries` table
742-
- all the above are displayed in the Migration dashboard.
728+
### Linter message codes
743729

744730
![code compatibility problems](docs/code_compatibility_problems.png)
745731

746-
[[back to top](#databricks-labs-ucx)]
747-
748-
### Linter message codes
749-
750732
Here's the detailed explanation of the linter message codes:
751733

752734
#### `cannot-autofix-table-reference`

docs/table_persistence.md

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -4,34 +4,34 @@ List of all UCX objects and their respective metadata.
44

55
## Overview
66

7-
Table Utilization:
8-
9-
| Table | Generate Assessment | Update Migration Progress | Migrate Groups | Migrate External Tables | Upgrade Jobs | Migrate tables | Migrate Data Reconciliation | Workflow linter |
10-
|--------------------------|---------------------|---------------------------|----------------|-------------------------|--------------|----------------|-----------------------------|-----------------|
11-
| tables | RW | RW | | RO | | RO | | |
12-
| grants | RW | RW | | RW | | RW | | |
13-
| mounts | RW | | | RO | RO | RO | | |
14-
| permissions | RW | | RW | RO | | RO | | |
15-
| jobs | RW | RW | | | RO | | | |
16-
| clusters | RW | RW | | | | | | |
17-
| directfs_in_paths | RW | RW | | | | | | RW |
18-
| directfs_in_queries | RW | RW | | | | | | RW |
19-
| external_locations | RW | | | RO | | | | |
20-
| workspace | RW | | RO | | RO | | | |
21-
| workspace_objects | RW | | | | | | | |
22-
| azure_service_principals | RW | | | | | | | |
23-
| global_init_scripts | RW | | | | | | | |
24-
| pipelines | RW | RW | | | | | | |
25-
| groups | RW | | RO | | | | | |
26-
| table_size | RW | | | | | | | |
27-
| submit_runs | RW | | | | | | | |
28-
| policies | RW | RW | | | | | | |
29-
| migration_status | | RW | | RW | | RW | | |
30-
| query_problems | RW | RW | | | | | | RW |
31-
| workflow_problems | RW | RW | | | | | | RW |
32-
| udfs | RW | RW | RO | | | | | |
33-
| logs | RW | | RW | RW | | RW | RW | |
34-
| recon_results | | | | | | | RW | |
7+
Table utilization per workflow:
8+
9+
| Table | Generate Assessment | Update Migration Progress | Migrate Groups | Migrate External Tables | Upgrade Jobs | Migrate tables | Migrate Data Reconciliation |
10+
|--------------------------|---------------------|---------------------------|----------------|-------------------------|--------------|----------------|-----------------------------|
11+
| tables | RW | RW | | RO | | RO | |
12+
| grants | RW | RW | | RW | | RW | |
13+
| mounts | RW | | | RO | RO | RO | |
14+
| permissions | RW | | RW | RO | | RO | |
15+
| jobs | RW | RW | | | RO | | |
16+
| clusters | RW | RW | | | | | |
17+
| directfs_in_paths | RW | RW | | | | | |
18+
| directfs_in_queries | RW | RW | | | | | |
19+
| external_locations | RW | | | RO | | | |
20+
| workspace | RW | | RO | | RO | | |
21+
| workspace_objects | RW | | | | | | |
22+
| azure_service_principals | RW | | | | | | |
23+
| global_init_scripts | RW | | | | | | |
24+
| pipelines | RW | RW | | | | | |
25+
| groups | RW | | RO | | | | |
26+
| table_size | RW | | | | | | |
27+
| submit_runs | RW | | | | | | |
28+
| policies | RW | RW | | | | | |
29+
| migration_status | | RW | | RW | | RW | |
30+
| query_problems | RW | RW | | | | | |
31+
| workflow_problems | RW | RW | | | | | |
32+
| udfs | RW | RW | RO | | | | |
33+
| logs | RW | | RW | RW | | RW | RW |
34+
| recon_results | | | | | | | RW |
3535

3636
**RW** - Read/Write, the job generates or updates the table.<br/>
3737
**RO** - Read Only

src/databricks/labs/ucx/assessment/workflows.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -186,13 +186,17 @@ def crawl_groups(self, ctx: RuntimeContext):
186186
@job_task
187187
def assess_dashboards(self, ctx: RuntimeContext):
188188
"""Scans all dashboards for migration issues in SQL code of embedded widgets.
189-
Also stores direct filesystem accesses for display in the migration dashboard."""
189+
190+
Also, stores direct filesystem accesses for display in the migration dashboard.
191+
"""
190192
ctx.query_linter.refresh_report(ctx.sql_backend, ctx.inventory_database)
191193

192194
@job_task
193195
def assess_workflows(self, ctx: RuntimeContext):
194-
"""Scans all jobs for migration issues in notebooks.
195-
Also stores direct filesystem accesses for display in the migration dashboard."""
196+
"""Scans all jobs for migration issues in notebooks jobs.
197+
198+
Also, stores direct filesystem accesses for display in the migration dashboard.
199+
"""
196200
ctx.workflow_linter.refresh_report(ctx.sql_backend, ctx.inventory_database)
197201

198202

src/databricks/labs/ucx/runtime.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
)
2121
from databricks.labs.ucx.progress.workflows import MigrationProgress
2222
from databricks.labs.ucx.recon.workflows import MigrationRecon
23-
from databricks.labs.ucx.source_code.workflows import ExperimentalWorkflowLinter
2423
from databricks.labs.ucx.workspace_access.workflows import (
2524
GroupMigration,
2625
PermissionsMigrationAPI,
@@ -58,7 +57,6 @@ def all(cls):
5857
ScanTablesInMounts(),
5958
MigrateTablesInMounts(),
6059
PermissionsMigrationAPI(),
61-
ExperimentalWorkflowLinter(),
6260
MigrationRecon(),
6361
Failing(),
6462
]

src/databricks/labs/ucx/source_code/workflows.py

Lines changed: 0 additions & 19 deletions
This file was deleted.

tests/integration/source_code/test_jobs.py

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -32,26 +32,6 @@
3232
from tests.unit.source_code.test_graph import _TestDependencyGraph
3333

3434

35-
@retried(on=[NotFound], timeout=timedelta(minutes=5))
36-
def test_running_real_workflow_linter_job(installation_ctx, make_job) -> None:
37-
# Deprecated file system path in call to: /mnt/things/e/f/g
38-
job = make_job(content="spark.read.table('a_table').write.csv('/mnt/things/e/f/g')\n")
39-
ctx = installation_ctx.replace(config_transform=lambda wc: replace(wc, include_job_ids=[job.job_id]))
40-
ctx.workspace_installation.run()
41-
ctx.deployed_workflows.run_workflow("experimental-workflow-linter")
42-
ctx.deployed_workflows.validate_step("experimental-workflow-linter")
43-
44-
# This test merely tests that the workflows produces records of the expected types; record content is not checked.
45-
cursor = ctx.sql_backend.fetch(f"SELECT COUNT(*) AS count FROM {ctx.inventory_database}.workflow_problems")
46-
result = next(cursor)
47-
if result['count'] == 0:
48-
installation_ctx.deployed_workflows.relay_logs("experimental-workflow-linter")
49-
assert False, "No workflow problems found"
50-
dfsa_records = installation_ctx.directfs_access_crawler_for_paths.snapshot()
51-
used_table_records = installation_ctx.used_tables_crawler_for_paths.snapshot()
52-
assert dfsa_records and used_table_records
53-
54-
5535
@retried(on=[NotFound], timeout=timedelta(minutes=2))
5636
def test_linter_from_context(simple_ctx, make_job) -> None:
5737
# This code is similar to test_running_real_workflow_linter_job, but it's executed on the caller side and is easier

0 commit comments

Comments
 (0)