-
Couldn't load subscription status.
- Fork 1.5k
Create shared schemas collector for DBM integrations #21720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Create shared schemas collector for DBM integrations #21720
Conversation
|
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| with self._get_cursor(database_name) as cursor: | ||
| # Get the next row from the cursor | ||
| next = self._get_next(cursor) | ||
| while next: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pymysql and psycopg cursors are iterable. I suspect the same for SqlServer drivers. We should be able to iterate the cursor directly an avoid the need for _get_next(). Did you try something like the following
| with self._get_cursor(database_name) as cursor: | |
| # Get the next row from the cursor | |
| next = self._get_next(cursor) | |
| while next: | |
| with self._get_cursor(database_name) as cursor: | |
| for next in cursor: | |
| self.maybe_flush(is_last_payload) | ||
| except Exception as e: | ||
| status = "error" | ||
| self._log.error("Error collecting schema: %s", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth including some stats in here? Such as how many databases/tables collected / time passed?
| self._log.debug("Starting collection of schemas for database %s", database['name']) | ||
| database_name = database['name'] | ||
| if not database_name: | ||
| self._log.warning("database has no name %v", database) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self._log.warning("database has no name %v", database) | |
| self._log.warning("database has no name %s", database) |
Strings should use %s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the database object
| raise NotImplementedError("Subclasses must implement _get_databases") | ||
|
|
||
| @abstractmethod | ||
| def _get_cursor(self, database): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _get_cursor(self, database): | |
| def _get_cursor(self, database) -> AbstractContextManager[Any]: |
I think we can type this as requiring a context manager using from contextlib import AbstractContextManager
What does this PR do?
Adds a shared schema collector for the DBM integrations (Postgres, MySQL, SQLServer).
Motivation
This class centralizes shared logic around iteration, buffering, submission, etc. Individual integrations will implement subclasses that handle actual data retrieval and mapping. See #21501 for the Postgres implementation.
Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged