-
Notifications
You must be signed in to change notification settings - Fork 535
Description
Describe the feature
I would like one or two data tests that could do the following:
- A test that can assert all rows have the same value, potentially within groups of rows grouped using
group_by_columns
. In this case, we don't care what the value is - we just want all the rows to be the same within each group. This would basically be the opposite of thenot_constant
test, and had previously been proposed in Add a test is_constant #835 (only closed due to inactivity). - A test that can assert all rows have a specific value, as specified within the test. Often it would be used with the
where
config to ensure that a field has a specific value under some condition.
I think these are potentially separate tests (e.g. is_constant
and is_constant_value
? is_constant
and is_value
? is_constant
and accepted_value
?), but one could also argue that they should be combined into one is_constant
test that tries to do it all and changes behavior based on whether test properties group_by_columns
or value
is provided. (The latter is the reason why I put this in one GitHub issue).
Describe alternatives you've considered
The dbt_utils.expression_is_true
test can at least be used to emulate the behavior of the proposed test that asserts a field has a constant value. The accepted_values
dbt test could also be used, although the list format is more verbose.
Additional context
It would be relevant to all database engines.
Who will this benefit?
For the first use case, the previous user in #835 suggested that they checked that a timezone is constant across several territories. Here's an example of how it might be used:
- name: time_zone
data_type: string
data_tests:
- dbt_utils.is_constant:
group_by_columns: [territory]
For the second use case, here is an example of where we might want to make sure a column has a specific value. We want to make sure the is_verified
column is FALSE
for rows that do not come from a specific origin
that is allowed to have verified rows.
- name: is_verified
data_type: boolean
data_tests:
- not_null
# Make sure the value is always FALSE when origin is not 'some_enum_value'
- dbt_utils.is_value:
value: 'FALSE'
quote: false
config:
where: "origin != 'some_enum_value'"
# Alternative ways of expressing the test using existing dbt_utils / dbt functionality
- dbt_utils.expression_is_true:
expression: '= FALSE'
config:
where: "origin != 'some_enum_value'"
- accepted_values:
values: ['FALSE']
quote: false
config:
where: "origin != 'some_enum_value'"
Are you interested in contributing this feature?
I would be happy to contribute a pull request. I would first need agreement on whether this should be done as one or two generic tests, and what they should be named. Once the test names and parameters / behaviors are agreed upon, it will be straightforward for me to implement this with integration tests without further significant assistance.