Add Profiler Extract Ingestion Job for Local Dashboards #2101

goodwillpunning · 2025-10-17T18:54:24Z

Changes

What does this PR do?

Adds a new function to the common job deployer to install the local ingestion job. The job transforms profiler extracts into Unity Catalog–managed tables in the user’s local Databricks workspace, enabling the profiler summary (“local”) dashboards.

Relevant implementation details

The implementation closely follows the existing reconcile job deployment. Please verify that the install_state isn’t lost between create/update and save, especially if an exception is raised before the save.

Caveats/things to watch out for when reviewing:

Linked issues

This PR compliments PR#2000.

Functionality

added relevant user documentation
added new CLI command
modified existing command: databricks labs lakebridge ...
installed as a part of the CLI command (see PR#2000)

Tests

manually tested
added unit tests
added integration tests

goodwillpunning · 2025-10-17T18:56:05Z

src/databricks/labs/lakebridge/deployment/job.py

+                )
+            ],
+            "tasks": [
+                NotebookTask(


@sundarshankar89 's comment from PR#2000 "There are 2 ways we can implement this, have the ingestion job as python package and use a wheel task Or have the notebook upload and then run the jobs.I prefer option 1."

github-actions · 2025-10-17T19:06:35Z

✅ 46/46 passed, 5 flaky, 2m50s total

Flaky tests:

🤪 test_validate_non_empty_tables (33ms)
🤪 test_transpile_teradata_sql_non_interactive[True] (14.272s)
🤪 test_transpile_teradata_sql (17.228s)
🤪 test_transpile_teradata_sql_non_interactive[False] (4.091s)
🤪 test_transpiles_informatica_to_sparksql (7.735s)

_{Running from acceptance #2793}

codecov · 2025-10-22T00:07:23Z

Codecov Report

❌ Patch coverage is 97.43590% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 65.03%. Comparing base (7be6421) to head (d8da68f).

Files with missing lines	Patch %	Lines
.../labs/lakebridge/assessments/profiler_validator.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2101      +/-   ##
==========================================
+ Coverage   64.87%   65.03%   +0.15%     
==========================================
  Files          96       96              
  Lines        7895     7933      +38     
  Branches      821      823       +2     
==========================================
+ Hits         5122     5159      +37     
- Misses       2593     2594       +1     
  Partials      180      180

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

goodwillpunning · 2025-10-23T21:35:46Z

src/databricks/labs/lakebridge/deployment/job.py

+    def _job_profiler_ingestion_task(self, task_key: str, description: str, lakebridge_wheel_path: str) -> Task:
+        libraries = [
+            compute.Library(whl=lakebridge_wheel_path),
+            compute.PythonPyPiLibrary(package="duckdb")


The ingestion job is dependent on duckdb library to read the profiler extract tables.

goodwillpunning · 2025-10-23T21:38:28Z

src/databricks/labs/lakebridge/assessments/dashboards/execute.py

+
+def main(*argv) -> None:
+    logger.debug(f"Arguments received: {argv}")
+    assert len(sys.argv) == 4, f"Invalid number of arguments: {len(sys.argv)}"


"Manually" testing this main() function outside of the wheel file, there appeared to be 3 additional arguments pertaining to the Python notebook session: 1) Interpreter, 2) -f flag, 3) env settings as a JSON file. Please review that the assumption that they will not be present in a wheel based job task is correct.

pyproject.toml

Add unit test for profiler ingestion job deployment. Update profiler ingestion job to be wheel-based Update profiler ingestion job unit tests. Update docs (#2110) Update docs Update docs (#2111) Update docs again Add table ingestion logic to profiler ingest job. Update table ingestion exception handling. Add duckdb dependency in Job Task definition. Correct library dependencies. Update entry point name to . Narrow exception handling for single table ingestion. Parse args in execute main method.

gueniai

LGTM

sundarshankar89

small nit: I m thinking about how to make this more generic for all sources. so Maintaning a static list will help here.

sundarshankar89 · 2025-10-29T03:28:46Z

src/databricks/labs/lakebridge/assessments/dashboards/execute.py

+def _ingest_profiler_tables(catalog_name: str, schema_name: str, extract_location: str) -> None:
+    try:
+        with duckdb.connect(database=extract_location) as duck_conn:
+            tables_to_ingest = duck_conn.execute("SHOW ALL TABLES").fetchall()


I will rather have static list of tables and validate them if they are all present else raise a warning or log them in a audit table.

Also I would like to manage a run summary table stating what tabels and how many records for better reconcile and obs.

On the second part, do you mean a physical table stored in UC catalog? I keep track of the successfully ingested tables and skipped tables on line 71-72 and log it in line 86-89.

goodwillpunning requested review from radhikaathalye-db and sundarshankar89 October 17, 2025 18:54

goodwillpunning self-assigned this Oct 17, 2025

goodwillpunning added the feat/profiler Issues related to profilers label Oct 17, 2025

goodwillpunning requested a review from a team as a code owner October 17, 2025 18:54

goodwillpunning temporarily deployed to tool October 17, 2025 18:54 — with GitHub Actions Inactive

goodwillpunning commented Oct 17, 2025

View reviewed changes

goodwillpunning temporarily deployed to tool October 20, 2025 13:38 — with GitHub Actions Inactive

goodwillpunning temporarily deployed to tool October 22, 2025 00:04 — with GitHub Actions Inactive

gueniai temporarily deployed to tool October 22, 2025 16:12 — with GitHub Actions Inactive

goodwillpunning temporarily deployed to tool October 23, 2025 11:47 — with GitHub Actions Inactive

goodwillpunning had a problem deploying to tool October 23, 2025 18:19 — with GitHub Actions Error

goodwillpunning temporarily deployed to tool October 23, 2025 18:24 — with GitHub Actions Inactive

goodwillpunning commented Oct 23, 2025

View reviewed changes

gueniai reviewed Oct 23, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

goodwillpunning had a problem deploying to tool October 24, 2025 17:33 — with GitHub Actions Error

goodwillpunning temporarily deployed to tool October 24, 2025 17:37 — with GitHub Actions Inactive

goodwillpunning temporarily deployed to tool October 24, 2025 19:23 — with GitHub Actions Inactive

goodwillpunning requested a review from gueniai October 28, 2025 13:29

goodwillpunning force-pushed the feature/local_ingestion_job branch from fbbc4cd to 4380362 Compare October 28, 2025 16:21

goodwillpunning had a problem deploying to tool October 28, 2025 16:21 — with GitHub Actions Error

goodwillpunning force-pushed the feature/local_ingestion_job branch from 4380362 to 46668ab Compare October 28, 2025 16:23

goodwillpunning temporarily deployed to tool October 28, 2025 16:23 — with GitHub Actions Inactive

goodwillpunning force-pushed the feature/local_ingestion_job branch from 46668ab to f1204f3 Compare October 28, 2025 16:30

goodwillpunning had a problem deploying to tool October 28, 2025 16:30 — with GitHub Actions Error

Merge branch 'main' into feature/local_ingestion_job

d8da68f

sundarshankar89 temporarily deployed to tool October 28, 2025 16:33 — with GitHub Actions Inactive

gueniai approved these changes Oct 28, 2025

View reviewed changes

sundarshankar89 requested changes Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Profiler Extract Ingestion Job for Local Dashboards #2101

Add Profiler Extract Ingestion Job for Local Dashboards #2101

goodwillpunning commented Oct 17, 2025

Uh oh!

goodwillpunning Oct 17, 2025

Uh oh!

github-actions bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 22, 2025 •

edited

Loading

Uh oh!

goodwillpunning Oct 23, 2025

Uh oh!

goodwillpunning Oct 23, 2025

Uh oh!

Uh oh!

gueniai left a comment

Uh oh!

sundarshankar89 left a comment

Uh oh!

sundarshankar89 Oct 29, 2025

Uh oh!

goodwillpunning Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add Profiler Extract Ingestion Job for Local Dashboards #2101

Are you sure you want to change the base?

Add Profiler Extract Ingestion Job for Local Dashboards #2101

Conversation

goodwillpunning commented Oct 17, 2025

Changes

What does this PR do?

Relevant implementation details

Caveats/things to watch out for when reviewing:

Linked issues

Functionality

Tests

Uh oh!

goodwillpunning Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

goodwillpunning Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

goodwillpunning Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gueniai left a comment

Choose a reason for hiding this comment

Uh oh!

sundarshankar89 left a comment

Choose a reason for hiding this comment

Uh oh!

sundarshankar89 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

goodwillpunning Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Oct 17, 2025 •

edited

Loading

codecov bot commented Oct 22, 2025 •

edited

Loading