-
Notifications
You must be signed in to change notification settings - Fork 96
Crawlers: append snapshots to history journal, if available #2743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 87 commits
Commits
Show all changes
89 commits
Select commit
Hold shift + click to select a range
15dd48d
Introduce an optional history log, where crawler snapshots are journa…
asnare 625f16a
Merge branch 'main' into crawler-snapshot-history
asnare 622792e
Switch to integer identifiers for run_id and snapshot_id.
asnare 3d31a1f
Merge branch 'main' into crawler-snapshot-history
asnare 4ebf9b6
Update to store object data with first-level attributes exposed as a …
asnare d82e06b
Use a 56-bit random number for the snapshot_id.
asnare 4d676bb
Switch to composite object identifier.
asnare cf48771
Formatting.
asnare e877532
Modify TODO to not trigger linter.
asnare b321903
Unit tests for crawler appending new snapshots to the history.
asnare 02fc40f
Merge branch 'main' into crawler-snapshot-history
asnare 6170a63
Ensure call-by-keyword, and indicate the return type.
asnare 4b55625
Fix unit test.
asnare 155cda7
Merge branch 'main' into crawler-snapshot-history
asnare 260becf
Back out changes to the crawler that relate to the history.
asnare 0efb3e5
Merge branch 'main' into crawler-snapshot-history
asnare 6e649af
Merge branch 'main' into crawler-snapshot-history
asnare a2dd1d5
Merge branch 'main' into crawler-snapshot-history
asnare f897d98
Merge branch 'main' into crawler-snapshot-history
asnare 4a5321c
Replace initial history record conversion and logging with a new vers…
asnare 76aeefb
Merge branch 'main' into crawler-snapshot-history
asnare e321589
Mark inner dataclasses in tests as immutable, so they are safe to use…
asnare 4e3e3c8
Fix type hint.
asnare 616b0f1
Mark test class as immutable.
asnare a52720d
Unit tests for the failures[] mechanism.
asnare 6f38192
When encoding a string field, also handle it being optional.
asnare af1fff2
Fix comparison.
asnare 724abb2
Unit tests for the history log.
asnare dd73486
Remove dead code.
asnare 31740dd
Merge branch 'main' into crawler-snapshot-history
asnare 00d5d85
Type hint.
asnare 57f63ba
Test detection of naive timestamps during encoding.
asnare 3aa2431
Rename test argument to avoid shadowing a global.
asnare 4c86ba6
Unit test for handling unserializable values.
asnare 5c27d8b
Merge branch 'main' into crawler-snapshot-history
asnare 39f1105
Inline trivial method.
asnare 1d50ce1
Update error message on unserializable value to provide more context.
asnare d431322
Merge branch 'main' into crawler-snapshot-history
asnare 95d2a48
Merge branch 'main' into crawler-snapshot-history
asnare 4ce4ce4
Rename for consistency.
asnare 7441077
Update HistoryLog initializer: it doesn't need a workspace client.
asnare 07bc711
Update to object_id support to allow properties as well as fields.
asnare 3b3e5bd
Tweak type signature: the snapshot to append can be any iterable type.
asnare a9dd77f
Unit tests for logging Table records into the history log.
asnare 0f9a4cd
Update Grants to support logging into the history log.
asnare ad482b8
Update the migration progress workflow to log tables and grants to th…
asnare fd2b3ac
Unit tests for a migration-progress task that wasn't covered yet.
asnare 9d4ed60
Merge branch 'main' into crawler-snapshot-history
asnare 3703848
Support classes whose failures attribute is a string (containing JSON…
asnare 62390b5
Ensure updated UDF snapshots are logged to the history table.
asnare c1fea83
Naming consistency.
asnare 8c70541
Fix copypasta.
asnare 6f41fc0
Ensure updated JobInfo snapshots are appended to the history log.
asnare 927392a
Fix type hints.
asnare 78f592d
Ensure updated ClusterInfo snapshots are stored in the history table.
asnare 6dfed4c
Fix some test names.
asnare 34b1e72
Ensure updated PipelineInfo snapshots are appended to the history table.
asnare 77a06e1
Ensure updated Cluster Policy snapshots are logged to the history table.
asnare ac60675
Formatting.
asnare f00a06e
Fix docstring.
asnare 4d6989c
Ensure that updated TableMigrationStatus snapshots are appended to th…
asnare 17100d5
Update query to return at most a single record.
asnare 78a9a13
Update integration test to verify that the history log is written to.
asnare 21ad44d
Update history log to write to multiworkspace.historical instead of u…
asnare febc9ce
Ensure all tasks run on a UC-enabled cluster.
asnare 383027d
Merge branch 'main' into crawler-snapshot-history
asnare e223612
Formatting.
asnare c938dd8
Split the crawling and history-log update across 2 tasks for the tabl…
asnare 8a378dc
Note a limitation of the fast-scan table crawler.
asnare 2e5869d
Factor out the ownership components so they can be used elsewhere.
asnare bf542a4
Mark the __id_attributes__ sequence as immutable.
asnare 3aad8bf
Mark linters as still needing to be done.
asnare a990327
Merge branch 'main' into crawler-snapshot-history
asnare 2381d18
Handle UDF failures, which aren't JSON-encoded as with other classes.
asnare 23e7720
Sort imports.
asnare 567d809
Test case (and fix) for when __id_attributes__ is annotated as None.
asnare f0bd963
Docstring explaining HistoricalEncoder design and intent.
asnare 5b2ab77
Rename some things to align more closely with what they are.
asnare 0ff2e95
Remove redundant type alternative.
asnare 971cb4c
Docstring wording and formatting updates.
asnare be16738
Mention use-case for these records.
asnare 2b9b476
Clarify reason for assumption.
asnare f17dc74
Document reason for non-default JSON separators.
asnare 823c886
Detect and handle non-string values being passed in string-hinted fie…
asnare 7377727
Merge branch 'main' into crawler-snapshot-history
asnare fdb03b2
Handle the remaining lsql-supported fields types.
asnare b7af858
All tasks in the workflow are supposed to depend on the assessment ha…
asnare 9b489b9
Explicitly declare the dependency of the record_workflow_run on the t…
asnare 0a2f4f2
Use correct method for converting rows to dictionaries.
asnare File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically dunder signals Python builtin. I prefer only leading
__
underscoreThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use just a leading
__
: that makes it a private attribute, and it can't be accessed from outside the instance. (Dunder attributes are special: although they start with__
that are not subject to mangling like ordinary fields. Arguably they signal a protocol in use behind the curtains, which indeed is what Python mainly uses them for.)In this case alternatives are:
_id_attributes: ClassVar[tuple[str, ...]]
: Protected, mildly cranky linter ("access to protected member of a client class").id_attributes: ClassVar[tuple[str, ...]]
: Public, which doesn't feel right for what is essentially metadata that we're trying to attach to the instance.A bit further from home, is something like the SQLAlchemy approach where metadata (such as this) is placed in a sibling-class. That's a pretty large hammer to hit this particular nail though.
Any further thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this confusion could have been avoided by using a method-based protocol...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stick with this for now