Skip to content

[FEATURE] Add support for CustomSql rowLevelConfiguration.filteredRowLabel - Where clause is not supported for rule type: CustomSql #623

@KaylanHusband

Description

@KaylanHusband

Is your feature request related to a problem? Please describe.

Where clause is not supported for rule type: CustomSql

I am using DQDL CustomSql functionality. I would like the ability to write the rules in such a way that any records that dont match certain SQL matching criteria can be marked as skipped. This makes it clear to determine DQ results that have passed, failed or skipped for a given rule. This functionality would significantly reduce the complexity of post-processing logic used to determine the skipped records. Similarly this functionality is supported by other DQDL rule types.

Describe the solution you'd like
I would like to write a CustomSql rule with an additional where clause denoting the matching criteria. Similar to the implementation in other rule types.

Describe alternatives you've considered
As an alternative, addtional post-processing logic was considered to perform a subesequent row level evaluation based on the SQL defeinition. This is largely redundant and ideally this can be done as part of the check itself.

Additional context
Sample Glue Script

from awsglue.transforms import *
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsgluedq.transforms import EvaluateDataQuality

# Create Glue context
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)

table_name = "redacted"

# Define DynamicFrame
orbit_data = glueContext.create_dynamic_frame.from_catalog(
    database="orbit_data_quality",
    table_name=table_name
)

# Create data quality ruleset
ruleset = """Rules = [
CustomSql "SELECT id FROM primary a WHERE id = 'foo'" where "id is not NULL"
]"""

# Evaluate data quality
dqResults = EvaluateDataQuality.apply(
    frame=orbit_data,
    ruleset=ruleset,
    publishing_options={
        "dataQualityEvaluationContext": "redacted",
        "enableDataQualityCloudWatchMetrics": True,
        "enableDataQualityResultsPublishing": True,
        "resultsS3Prefix": "s3://redacted",
    },
    additional_options={
        "performanceTuning.caching": "CACHE_INPUT",
        "observations.scope": "ALL",
        "rowLevelConfiguration.filteredRowLabel": "SKIPPED"
    }
)

# Inspect data quality results
dqResults.printSchema()
dqResults.toDF().show()

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions