-
Notifications
You must be signed in to change notification settings - Fork 93
Persist workflow problems in UCX multiworkspace catalog #3756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asnare : Would like you guidance on this PR
@@ -45,10 +46,25 @@ class JobProblem: | |||
end_line: int | |||
end_col: int | |||
|
|||
__id_attributes__: ClassVar[tuple[str, ...]] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asnare : This can only be str
according to the current implementation. Do you know why? And/or what what to do with non-string fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are constrained because the mechanism we use to encode them as an identifier is only defined for when they're a string. I vaguely recall proposing JSON-encoding the sequence, but that was rejected.
Anything that isn't a string needs to be made available (unambiguously) as a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that isn't a string needs to be made available (unambiguously) as a string.
Can we made these available as string without changing the dataclass attributes? (We want to avoid that as this is persisted as table in the UCX inventory, thus difficult to update the data types.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think read-only properties will work?
@@ -37,17 +38,28 @@ | |||
logger = logging.getLogger(__name__) | |||
|
|||
|
|||
class WorkflowLinter: | |||
def __init__( | |||
class WorkflowLinter(CrawlerBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asnare : This "crawler" does not fit the usual crawler format as it "crawls" multiple resources: JobProblems
, UsedTable
and DirectFsAccess
. You looked into this. What is your conclusion on handling this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a very difficult problem to solve: I'm not sure I reached a conclusion. I had a sense that we could use some sort of "composite" crawler than produced the different things all at once and then split them up, but never really went through the detailed effort of figuring out the details to graft those two worlds together.
Changes
Persist workflow problems in UCX multiworkspace catalog:
JobProblem
ownership in UCX multiworkspace catalogWorkflowLinter
to be a crawler for similar API for persisting snapshot in mutlworkspace catalogJobProblemOwnership
to identify owners for job problemsLinked issues
Resolves #3237
Functionality
migration-progress
Tests