Skip to content

[FEATURE] Storing checks using DQX classes #368

@davidwanner-8451

Description

@davidwanner-8451

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am very new to DQX and tried to look around for this - I am getting an error similar to #205 except for when I try to save off DQX classes as a UC table.

I build a series of rules using DQRowRule, and then am unable to use dq_engine.save_checks_in_table. When I run it I get the following error:

ValueError: Unsupported check type: <class 'databricks.labs.dqx.rule.DQRowSingleColRule'>

Expected Behavior

I would expect that even using the DQX classes for rules that I could write rules out to a UC table.

Steps To Reproduce

I am working on a UC-enabled cluster, 15.4LTS

Below is a basic example of a rule I am trying to save:

from databricks.labs.dqx import check_funcs
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.rule import DQRowRule, DQRowRuleForEachCol
from databricks.sdk import WorkspaceClient

# Establish once
dq_engine = DQEngine(WorkspaceClient())

date_checks = [
  DQRowRule(
    name="start_date_not_future",
    check_func=check_funcs.is_not_in_future,
    column = "START_DATE"
  ),
  DQRowRule(
    name="duration_in_range",
    check_func=check_funcs.is_in_range,
    column = "DURATION_IN_WEEKS",
    check_func_kwargs={"min_limit": 12, "max_limit": 26}
  )
]

dq_engine.save_checks_in_table(checks=date_checks, table_name=f"learning_dev.{user}.date_checks", mode="overwrite")

Cloud

Azure

Operating System

macOS

Relevant log output

ValueError: Unsupported check type: <class 'databricks.labs.dqx.rule.DQRowSingleColRule'>
Unsupported check type: <class 'databricks.labs.dqx.rule.DQRowSingleColRule'>
File <command-5234376178056133>, line 24
      9 date_checks = [
     10   DQRowRule(
     11     name="start_date_not_future",
   (...)
     20   )
     21 ]
     23 user = "d108220"
---> 24 dq_engine.save_checks_in_table(checks=date_checks, table_name=f"learning_dev.{user}.date_checks", mode="overwrite")
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-251ce94f-a41b-4869-a242-df00f3607b26/lib/python3.11/site-packages/databricks/labs/dqx/engine.py:968, in DQEngine.save_checks_in_table(checks, table_name, run_config_name, mode)
    960 """
    961 Save checks to a Delta table in the workspace.
    962 :param checks: list of dq rules to save
   (...)
    965 :param mode: Output mode for writing checks to Delta (e.g. `append` or `overwrite`)
    966 """
    967 logger.info(f"Saving quality rules (checks) to table {table_name}")
--> 968 DQEngine._save_checks_in_table(checks, table_name, run_config_name, mode)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-251ce94f-a41b-4869-a242-df00f3607b26/lib/python3.11/site-packages/databricks/labs/dqx/engine.py:1017, in DQEngine._save_checks_in_table(checks, table_name, run_config_name, mode)
   1015 @staticmethod
   1016 def _save_checks_in_table(checks: list[dict], table_name: str, run_config_name: str, mode: str):
-> 1017     rules_df = DQEngineCore.build_dataframe_from_quality_rules(checks, run_config_name=run_config_name)
   1018     rules_df.write.option("replaceWhere", f"run_config_name = '{run_config_name}'").saveAsTable(
   1019         table_name, mode=mode
   1020     )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-251ce94f-a41b-4869-a242-df00f3607b26/lib/python3.11/site-packages/databricks/labs/dqx/engine.py:222, in DQEngineCore.build_dataframe_from_quality_rules(checks, run_config_name, spark)
    219 if spark is None:
    220     spark = SparkSession.builder.getOrCreate()
--> 222 dq_rule_checks = DQEngineCore.build_checks_by_metadata(checks)
    224 dq_rule_rows = []
    225 for dq_rule_check in dq_rule_checks:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-251ce94f-a41b-4869-a242-df00f3607b26/lib/python3.11/site-packages/databricks/labs/dqx/engine.py:267, in DQEngineCore.build_checks_by_metadata(checks, custom_checks)
    265 status = DQEngineCore.validate_checks(checks, custom_checks)
    266 if status.has_errors:
--> 267     raise ValueError(str(status))
    269 dq_rule_checks: list[DQRule] = []
    270 for check_def in checks:

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions