Skip to content

Persist workflow problems in UCX multiworkspace catalog #3756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

JCZuurmond
Copy link
Contributor

@JCZuurmond JCZuurmond commented Feb 25, 2025

Changes

Persist workflow problems in UCX multiworkspace catalog:

  • Persist JobProblem ownership in UCX multiworkspace catalog
  • Refactor WorkflowLinter to be a crawler for similar API for persisting snapshot in mutlworkspace catalog
  • Introduce JobProblemOwnership to identify owners for job problems

Linked issues

Resolves #3237

Functionality

  • modified existing workflow: migration-progress

Tests

  • added unit tests
  • added integration tests

@JCZuurmond JCZuurmond added feat/viz vizualizing UCX progress as a redash/lakeview dashboard feat/migration-progress Issues related to the migration progress workflow labels Feb 25, 2025
@JCZuurmond JCZuurmond requested a review from asnare February 25, 2025 12:32
@JCZuurmond JCZuurmond self-assigned this Feb 25, 2025
Copy link
Contributor Author

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asnare : Would like you guidance on this PR

@@ -45,10 +46,25 @@ class JobProblem:
end_line: int
end_col: int

__id_attributes__: ClassVar[tuple[str, ...]] = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asnare : This can only be str according to the current implementation. Do you know why? And/or what what to do with non-string fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are constrained because the mechanism we use to encode them as an identifier is only defined for when they're a string. I vaguely recall proposing JSON-encoding the sequence, but that was rejected.

Anything that isn't a string needs to be made available (unambiguously) as a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that isn't a string needs to be made available (unambiguously) as a string.

Can we made these available as string without changing the dataclass attributes? (We want to avoid that as this is persisted as table in the UCX inventory, thus difficult to update the data types.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read-only properties will work?

@@ -37,17 +38,28 @@
logger = logging.getLogger(__name__)


class WorkflowLinter:
def __init__(
class WorkflowLinter(CrawlerBase):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asnare : This "crawler" does not fit the usual crawler format as it "crawls" multiple resources: JobProblems, UsedTable and DirectFsAccess. You looked into this. What is your conclusion on handling this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a very difficult problem to solve: I'm not sure I reached a conclusion. I had a sense that we could use some sort of "composite" crawler than produced the different things all at once and then split them up, but never really went through the detailed effort of figuring out the details to graft those two worlds together.

@JCZuurmond JCZuurmond assigned pritishpai and unassigned JCZuurmond Feb 26, 2025
@JCZuurmond JCZuurmond moved this from Blocked/Hold to Todo in UCX Mar 4, 2025
@Mcdj123
Copy link

Mcdj123 commented Mar 11, 2025

@JCZuurmond

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/migration-progress Issues related to the migration progress workflow feat/viz vizualizing UCX progress as a redash/lakeview dashboard
Projects
Status: Todo
4 participants