Skip to content

Crawler: support for object ownership #2774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 66 commits into from
Oct 9, 2024
Merged

Crawler: support for object ownership #2774

merged 66 commits into from
Oct 9, 2024

Conversation

asnare
Copy link
Contributor

@asnare asnare commented Oct 1, 2024

Changes

As part of #2761 we need to have a way for determining the user responsible for some of our inventory types. This PR updates the crawler framework so that:

  • There is a way to identify an owner of a resource referred to by inventory records.
  • When the owner cannot be identified, a workspace or account administrator is used instead.

Linked issues

Progresses #2761.

Functionality

  • A component for locating an administrator user.

  • Ownership information for the following inventory types:

    • ClusterInfo
    • DirectFsAccess (stubbed)
    • Grant
    • JobInfo
    • PipelineInfo
    • PolicyInfo
    • Table
    • TableMigrationStatus
    • UDF

Tests

  • added unit tests
  • added integration tests

This involves wiring up all the crawlers to have a workspace client, needed to locate the workspace administrator when an owner for the object cannot be found.
@asnare asnare added the enhancement New feature or request label Oct 1, 2024
@asnare asnare self-assigned this Oct 1, 2024
Iterable[User]: The active workspace administrators, if any.
"""
all_users = ws.users.list(attributes="id,active,userName,roles")
return (user for user in all_users if user.active and _has_role(user, "workspace_admin"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have that role and it's no longer the admins group membership?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was just a guess while I figured out how it's supposed to be done. (And indeed, it now uses membership of the admins workspace group.)

found.
"""
first_user = functools.partial(min, default=None, key=lambda user: user.name)
return first_user(find_workspace_admins(ws)) or first_user(find_account_admins(ws))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget to sort, because each invocation first user will be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is already sorting1: head(sort(…))min(…)

Footnotes

  1. It was using the wrong attribute; it should be user.user_name and that's now fixed.

asnare added 26 commits October 3, 2024 10:37
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite assert "@" in ownership.owner_of(grant_record) to check for concrete usernames as much as possible, as these tests don't test enough - e.g. hiding the problem

# Verify ownership is as expected.
ownership = ClusterOwnership(runtime_ctx.administrator_locator)
assert ownership.owner_of(cluster_record_with_owner) == ws.current_user.me().user_name
assert "@" in ownership.owner_of(cluster_record_without_owner)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exact user name match is preferred

Comment on lines +47 to +48
def test_pipeline_ownership(ws, runtime_ctx, make_pipeline, inventory_schema, sql_backend) -> None:
"""Verify the ownership can be determined for crawled pipelines."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_pipeline_ownership(ws, runtime_ctx, make_pipeline, inventory_schema, sql_backend) -> None:
"""Verify the ownership can be determined for crawled pipelines."""
def test_verify_ownership_determined_for_crawled_pipelines(ws, runtime_ctx, make_pipeline, inventory_schema, sql_backend) -> None:


# Verify ownership can be made.
ownership = GrantOwnership(runtime_ctx.administrator_locator)
assert "@" in ownership.owner_of(grant_record)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite assert "@" in ownership.owner_of(grant_record) to check for concrete usernames as much as possible, as these tests don't test enough - e.g. hiding the problem

asnare added 2 commits October 9, 2024 12:12
For some reason deleting the owner of a cluster doesn't clear the creator field, even though the documentation says it should. Instead we just check that the creator is actually returned.
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nfx nfx merged commit 35d04aa into main Oct 9, 2024
4 of 7 checks passed
@nfx nfx deleted the inventory-object-owners branch October 9, 2024 12:15
@nfx nfx added the internal this pull request won't appear in release notes label Oct 9, 2024
nfx pushed a commit that referenced this pull request Oct 9, 2024
## Changes

This updates an integration test to more thoroughly verify that we can
locate the user-name of an administrator.

### Linked issues

Follows: #2774

### Tests

 - updated integration test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal this pull request won't appear in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants