Skip to content
This repository was archived by the owner on Sep 16, 2025. It is now read-only.
This repository was archived by the owner on Sep 16, 2025. It is now read-only.

Improve DatabricksNotebookOperator monitoring job behaviour #80

@tatiana

Description

@tatiana

A customer reported that, from time to time, instances of DatabricksNotebookOperator are stuck in a running state in Airflow while being completed on Databricks.

The logs need to explain what the Databricks job is trying to use - they are empty.

While checking our code, I noticed that the implementation could be improved.
https://github.com/astronomer/astro-provider-databricks/blob/3e1ca039a024a98f9079d178478aa24702e15453/src/astro_databricks/operators/notebook.py#L235C1-L238C64

The implementation seems to have been improved in our contribution to Airflow
apache/airflow#39178

In:
https://github.com/astronomer/airflow/blob/20dacc7cec64d0055fad79943fd6afa453dbe775/airflow/providers/databricks/operators/databricks.py#L1038-L1063

Since this affects an Astronomer customer and we have not completed the migration yet, my suggestion is that:

  • We give visibility of what is happening in the Airflow worker node by logging something like "Waiting for the job to complete, current status: PENDING"
  • We make the implementation of polling the status of the job consistent with what we have contributed to Airflow.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions