Skip to content

Conversation

@sethsamuel
Copy link
Contributor

@sethsamuel sethsamuel commented Oct 21, 2025

What does this PR do?

Adds a shared schema collector for the DBM integrations (Postgres, MySQL, SQLServer).

Motivation

This class centralizes shared logic around iteration, buffering, submission, etc. Individual integrations will implement subclasses that handle actual data retrieval and mapping. See #21501 for the Postgres implementation.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@datadog-official
Copy link
Contributor

datadog-official bot commented Oct 21, 2025

⚠️ Tests

⚠️ Warnings

❄️ 1 New flaky test detected

test_statement_metrics_and_plans[master-EXEC multiQueryProc-expected_queries_patterns1-param_groups1-5-False-True-True-False-multiqueryproc] from test_statements.py (Datadog)
missing expected matching rows
assert 1 == 2
 +  where 1 = len([{'dd_commands': ['SELECT'], 'dd_comments': [], 'dd_tables': ['ϑings'], 'execution_count': 5, ...}])
 +  and   2 = len(["select @total = @total \\+ count\\(\\*\\) from sys\\.databases where name like '%_'", "select @total = @total \\+ count\\(\\*\\) from sys\\.sysobjects where type = 'U'"])

ℹ️ Info

🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 2a65b0a | Docs | Was this helpful? Give us feedback!

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 89.36170% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.12%. Comparing base (74c7edc) to head (2a65b0a).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 88 to 91
with self._get_cursor(database_name) as cursor:
# Get the next row from the cursor
next = self._get_next(cursor)
while next:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pymysql and psycopg cursors are iterable. I suspect the same for SqlServer drivers. We should be able to iterate the cursor directly an avoid the need for _get_next(). Did you try something like the following

Suggested change
with self._get_cursor(database_name) as cursor:
# Get the next row from the cursor
next = self._get_next(cursor)
while next:
with self._get_cursor(database_name) as cursor:
for next in cursor:

self.maybe_flush(is_last_payload)
except Exception as e:
status = "error"
self._log.error("Error collecting schema: %s", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth including some stats in here? Such as how many databases/tables collected / time passed?

self._log.debug("Starting collection of schemas for database %s", database['name'])
database_name = database['name']
if not database_name:
self._log.warning("database has no name %v", database)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._log.warning("database has no name %v", database)
self._log.warning("database has no name %s", database)

Strings should use %s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the database object

raise NotImplementedError("Subclasses must implement _get_databases")

@abstractmethod
def _get_cursor(self, database):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_cursor(self, database):
def _get_cursor(self, database) -> AbstractContextManager[Any]:

I think we can type this as requiring a context manager using from contextlib import AbstractContextManager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants