Skip to content

Fix infrahub/#8 #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Oct 17, 2024
Merged

Fix infrahub/#8 #7

merged 14 commits into from
Oct 17, 2024

Conversation

PhillSimonds
Copy link

Make infrahubctl transform and render commands use InfrahubClient to communicate with GQL API.

Make infrahubctl transform and render commands use InfrahubClient to communicate with GQL API.
@PhillSimonds
Copy link
Author

Happy to add some unit/integration tests here if beneficial.

Copy link
Contributor

@wvandeun wvandeun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PhillSimonds, first of all, Thank you for your contribution!
I left some comments. If they are not clear to you, please reach out.

@PhillSimonds
Copy link
Author

Thanks for the comments Wim. I've added to the comments and updated per your feedback. I also renamed the town crier doc to reflect the new issue number now that the issue has been moved to this repo.

Let me know your thoughts on changing the transform run logic vs doing something else. Happy to update if there is a better way.

Also, this CLI command was broken when I picked up the PR. Would you like me to add an integration test to protect this in the future?

Phillip Simonds added 4 commits September 23, 2024 13:00
The client is assigned now via the cached_property.
We don't want a value passed into the function that isn't used, and likewise
don't want to change the API for that init function.
The collect_data method has been updated to support acquiring data from both a stored graphql query endpoint
and the standard graphql endpoint. Before, only the stored api endpoint was supported.
@PhillSimonds
Copy link
Author

PhillSimonds commented Sep 25, 2024

Wim. I updated the InfrahubTransform so that it will try to query the stored graphql query path first and query the graphql API directly in the event the stored query doesn't exist. That allows us to just call .run() on the transform class. I think that's the most elegant way to do this.

Another thought here -- there seems to be some confusion in the InfrahubTransform class's understanding of "branch". It's getting the branch name from the local repo if a branch name isn't passed in, but the branch name is actually used to ask for data from the specified branch in infrahub. I believe the repo branch and infrahub branch are completely separate, so we probably shouldn't be using a repo object to look at the local branch then pass that in the API request. the .branch_name property in the InfrahubTransform is only used in calls to the infrahub API from everything I see inside the class. I guess some external component could be using it to introspect the branch name of the local branch. Not sure if this should be remediated.

@petercrocker petercrocker requested review from a team and removed request for wvandeun September 30, 2024 13:57
@petercrocker
Copy link
Contributor

@opsmill/backend please review this one from @PhillSimonds !

@ogenstad
Copy link
Contributor

Hey @PhillSimonds, thank you for your contribution!

Also, this CLI command was broken when I picked up the PR. Would you like me to add an integration test to protect this in the future?

More testing overall would be great :) The CTL specifically is missing test coverage. In what way was the command currently broken?

Looking through the code to initialize transforms I realize that we need to do some refactoring with regards to how it's done as we have multiple ways of doing it at present. There is a small caveat with the PR you've done with regards to how the ctl currently expects a client to be setup in the way that the config would look a bit different if we used the client returned by a call to initialize_client()

As this currently targeting the stable branch and I think larger changes are needed I think it would be good to lower the scope and number of changes on this one. I'll open another issue to describe some thoughts I had in mind for these changes but they should probably be done against develop as this is more of a small bugfix.

It would be impossible to know but we're going to remove the part with server_url as that's never used within the codebase today.

I think what we need to do here is to change the initializer:

class InfrahubTransform:
    name: Optional[str] = None
    query: str
    timeout: int = 10

    def __init__(self, branch: str = "", root_directory: str = "", server_url: str = "", client: Optional[InfrahubClient] = None):
        self._client: InfrahubClient

Then have a property for .client which currently acts as a setter/getter (later the client will be a required argument). Here the same thing could be done with self.git as it looked like you find a bug around that if that was the issue?

Then here:

    try:
        transform_instance = get_transform_class_instance(transform_config=transform_config)
    except InfrahubTransformNotFoundError as exc:
        console.print(f"Unable to load {transform_name} from python_transforms")
        raise typer.Exit(1) from exc

We'd add a:

transform_instance.client = the-client-object

A problem here with the current code could be that initialize_client is an async function even though there's no need for it to be. We should be able to change it to a sync function. Alternatively we can cheat and run initialize_client_sync and convert it:

sync_client = initialize_client_sync()
async_client = InfrahubClient(config=sync_client.config)

Does this make sense to you?

@PhillSimonds
Copy link
Author

PhillSimonds commented Oct 1, 2024

Happy to contribute :).

More testing overall would be great :) The CTL specifically is missing test coverage.

Awesome, I'll add some tests.

In what way was the command currently broken?

Execution of the graphQL query was failing because the query wasn't loaded into the repository config object, and thus get_query($name) could not get the query.

*Further calls up the stack trace omitted for brevity

  File ".../infrahub-sdk-python/infrahub_sdk/ctl/utils.py", line 106, in
execute_graphql_query
    query_object = repository_config.get_query(name=query)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../infrahub-sdk-python/infrahub_sdk/schema.py", line 228, in
get_query
    return self._get_resource(resource_id=name, resource_type=InfrahubRepositoryGraphQLConfig)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../infrahub_sdk/schema.py", line 192, in
_get_resource
    raise KeyError(f"Unable to find {resource_id!r} in {RESOURCE_MAP!r}")
KeyError: "Unable to find 'tags_query' in 'queries'"

Here the same thing could be done with self.git as it looked like you find a bug around that if that was the issue?

The .git attribute wasn't actually causing an issue on execution of code because we were passing a branch name in, so the block that would have caused an issue with accessing .git wasn't being hit even though it's a bug that could blow things up. There are two things I think we should consider with this object:

  1. a .git attribute doesn't always exist (the bug I found). Because of this the branch_name property on the InfrahubTransform would have raised an error if no branch_name was passed in at object instantiation because it would have tried to access .git in order to acquire it. I fixed this by modifying the branch_name property such that it uses hasattr(self, git) to determine if the git attribute is specified before trying to determine if it's set to None (and then instantiate a new Repo() object if this is the case). This is probably better fixed by the solution you recommended, making git an optional parameter we can pass in at object instantiation, then using the property as a getter/setter.

  2. The logic behind this git object doesn't make sense to me and seems like it could cause an issue with code execution at some point. The .git attribute is a local Repo() object, but it is being used as if it were the branch_name to query in infrahub. To me, these should be two completely separate things -- but I may be misunderstanding something here.

**I filed a separate issue for the git stuff, #49. I am thinking it might make sense to create a separate PR for its full resolution, but I'm happy to integrate into this one if you think it would be better.

Does this make sense to you?

I think I'm following. I'll make some updates here shortly then ping for further review :)

- Makes  a synchronous function
- Make .client property of InfrahubTransform a setter/getter
- Add client as argument for initialization of InfrahubTransform
@PhillSimonds
Copy link
Author

PhillSimonds commented Oct 1, 2024

@ogenstad I've made the following updates:

  • InfrahubTransform now takes an optional client argument (of type infrahubClient) in it's init method. This is assigned to ._client
  • I've modified .client from a cached_property to a standard property. It gets ._client if ._client exists, otherwise it sets it to a new instance of InfrahubClient using the server URL
  • the initialize_client utility has been modified to make it synchronous. I changed all places in the code where this was called by removing the await keyword. This was done using grep. Mypy also flagged errors in the event the newly synchronous function call was being awaited instead of called directly. These two checks in mind -- I'm pretty sure I caught and modified all calls to initialize_client
  • We now create a client using initialize_client in the infrahubctl transform command, then pass that client into InfrahubTransform at instantiation. I like doing it this way more then assigning the .client attribute post-instantiation as we're not overriding the getter/setter this way.

Copy link
Contributor

@ogenstad ogenstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'd like to remove some of the additions in this PR and focus on the main cause we're trying to solve with the bug and instead raise other issues to discuss other improvements such as the loading of queries if that is currently problematic.

Feel free to reach out if you want a quick sync around this!

if not self.repository_config:
raise
query_str = self.repository_config.get_query(name=self.query).load_query()
return await self.client.execute_graphql(query=query_str, variables=variables, branch_name=self.branch_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to revert to the old behaviour here so that the focus of this PR is only to add the missing InfrahubClient that was the target of #8.

While I can see that the collect_data is missing some information as it is now I'm not sure that we should add a InfrahubRepositoryConfig object to the transform itself in order to load a missing query. I think we should handle all that in a different way. I'm curious to the root cause of this if you had problems running with a local query if it had to do with the fact that the query wasn't defined within your .infrahub.yml file? This is my assumption based on the repository_config.queries.append(query_config_obj) could this be correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query was indeed not in my infrahub.yml file (hence the loading of the query in L#352-354 of cli_commands.py, which can be removed with this in mind) -- but that isn't the root of the issue with running the transform here.

The InfrahubTransform object in stable only has access to the query name, not to the query string. The query name is able to be used in the front end because a stored query at the path f"{self.address}/api/query/{name}" pre-exists in infrahub which returns data, so we don't actually need the query string to make the query, just the name of the query. I think this stored query path is created as part of syncing a repo (though I'm not sure). This is not the case when the infrahubctl transform command is run, because the stored query path does not pre-exist in infrahub (unless I'm doing something wrong here). This in mind, we need the full query string to execute the query with infrahubctl transform, not just the query name.

HTTP communication failure: Client error '403 Forbidden' for url 'http://localhost:8000/api/query/tags_query?branch=main&update_group=false'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403 on POST to 
http://localhost:8000/api/query/tags_query?branch=main&update_group=false

I can add a client object to the transform (as per #8), but I'll have to call the client object's .execute_graphql method directly with the query string in order to acquire data, circumventing the transform's run method for data acquisition. I can then run the transform with that data. Is this what you're thinking would be best to resolve #8 with a more modest touch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated per my comment there. If this isn't what you're thinking (or I'm missing something) LMK and I can modify.

Copy link
Contributor

@ogenstad ogenstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I left a minor comment about a slight typo.

I also opened #54, which I think might have been some source of confusion for you.

@ogenstad
Copy link
Contributor

ogenstad commented Oct 4, 2024

Regarding the debug: #56

@PhillSimonds PhillSimonds changed the title Fix infrahub/#4143 Fix infrahub/#8 Oct 15, 2024
@PhillSimonds
Copy link
Author

@ogenstad I've added some tests for the infrahubctl transform command. I believe this should be good to go, let me know if I need to add or remove something more.

@PhillSimonds PhillSimonds requested a review from ogenstad October 15, 2024 22:44
Copy link
Contributor

@ogenstad ogenstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes!

@ogenstad ogenstad merged commit e91bd2c into opsmill:stable Oct 17, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants