-
Couldn't load subscription status.
- Fork 573
Description
Is your feature request related to a problem? Please describe.
Where clause is not supported for rule type: CustomSql
I am using DQDL CustomSql functionality. I would like the ability to write the rules in such a way that any records that dont match certain SQL matching criteria can be marked as skipped. This makes it clear to determine DQ results that have passed, failed or skipped for a given rule. This functionality would significantly reduce the complexity of post-processing logic used to determine the skipped records. Similarly this functionality is supported by other DQDL rule types.
Describe the solution you'd like
I would like to write a CustomSql rule with an additional where clause denoting the matching criteria. Similar to the implementation in other rule types.
Describe alternatives you've considered
As an alternative, addtional post-processing logic was considered to perform a subesequent row level evaluation based on the SQL defeinition. This is largely redundant and ideally this can be done as part of the check itself.
Additional context
Sample Glue Script
from awsglue.transforms import *
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsgluedq.transforms import EvaluateDataQuality
# Create Glue context
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
table_name = "redacted"
# Define DynamicFrame
orbit_data = glueContext.create_dynamic_frame.from_catalog(
database="orbit_data_quality",
table_name=table_name
)
# Create data quality ruleset
ruleset = """Rules = [
CustomSql "SELECT id FROM primary a WHERE id = 'foo'" where "id is not NULL"
]"""
# Evaluate data quality
dqResults = EvaluateDataQuality.apply(
frame=orbit_data,
ruleset=ruleset,
publishing_options={
"dataQualityEvaluationContext": "redacted",
"enableDataQualityCloudWatchMetrics": True,
"enableDataQualityResultsPublishing": True,
"resultsS3Prefix": "s3://redacted",
},
additional_options={
"performanceTuning.caching": "CACHE_INPUT",
"observations.scope": "ALL",
"rowLevelConfiguration.filteredRowLabel": "SKIPPED"
}
)
# Inspect data quality results
dqResults.printSchema()
dqResults.toDF().show()