Skip to content

Add test(s) to assert that rows have constant values: either within a group, or with a given value #1048

@james-johnston-thumbtack

Description

Describe the feature

I would like one or two data tests that could do the following:

  • A test that can assert all rows have the same value, potentially within groups of rows grouped using group_by_columns. In this case, we don't care what the value is - we just want all the rows to be the same within each group. This would basically be the opposite of the not_constant test, and had previously been proposed in Add a test is_constant #835 (only closed due to inactivity).
  • A test that can assert all rows have a specific value, as specified within the test. Often it would be used with the where config to ensure that a field has a specific value under some condition.

I think these are potentially separate tests (e.g. is_constant and is_constant_value? is_constant and is_value? is_constant and accepted_value?), but one could also argue that they should be combined into one is_constant test that tries to do it all and changes behavior based on whether test properties group_by_columns or value is provided. (The latter is the reason why I put this in one GitHub issue).

Describe alternatives you've considered

The dbt_utils.expression_is_true test can at least be used to emulate the behavior of the proposed test that asserts a field has a constant value. The accepted_values dbt test could also be used, although the list format is more verbose.

Additional context

It would be relevant to all database engines.

Who will this benefit?

For the first use case, the previous user in #835 suggested that they checked that a timezone is constant across several territories. Here's an example of how it might be used:

      - name: time_zone
        data_type: string
        data_tests:
          - dbt_utils.is_constant:
            group_by_columns: [territory]

For the second use case, here is an example of where we might want to make sure a column has a specific value. We want to make sure the is_verified column is FALSE for rows that do not come from a specific origin that is allowed to have verified rows.

      - name: is_verified
        data_type: boolean
        data_tests:
          - not_null
          # Make sure the value is always FALSE when origin is not 'some_enum_value'
          - dbt_utils.is_value:
            value: 'FALSE'
            quote: false
            config:
              where: "origin != 'some_enum_value'"
          # Alternative ways of expressing the test using existing dbt_utils / dbt functionality
          - dbt_utils.expression_is_true:
              expression: '= FALSE'
              config:
                where: "origin != 'some_enum_value'"
          - accepted_values:
              values: ['FALSE']
              quote: false
              config:
                where: "origin != 'some_enum_value'"

Are you interested in contributing this feature?

I would be happy to contribute a pull request. I would first need agreement on whether this should be done as one or two generic tests, and what they should be named. Once the test names and parameters / behaviors are agreed upon, it will be straightforward for me to implement this with integration tests without further significant assistance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions