Skip to content

[BUG] duplicates in effectivity satelites #231

@pdaszczynski

Description

@pdaszczynski

Describe the bug
When for two last records in stage for one src_dfk has not been changed src_sfk and between previous and current data processing for this particular src_dfk nothing changed eff_sat always insert one additional record to table.

Environment

dbt version: 1.7.8
automate_dv version: 0.10.2
Database/Platform: postgres

To Reproduce
Steps to reproduce the behavior:

  1. Insert to stage like below, I have deleted *hk columns to not do the mess here
updated_date organisation_id customer_id start_date end_date effective_from load_date
2024-04-12 7 9491 2024-04-12 2024-04-19 2024-04-12 2024-04-24 08:18:21.882583 +00:00
2024-04-19 5 9491 2024-04-19 2024-04-21 2024-04-19 2024-04-24 08:18:21.882583 +00:00
2024-04-21 5 9491 2024-04-21 9999-12-31 2024-04-21 2024-04-24 08:18:21.882583 +00:00
  1. create effectivity satelite
{{ config(materialized='incremental')  }}
{%- set source_model = "stage_organisation_customer" -%}
{%- set src_pk = "CUSTOMER_ORGANISATION_HK" -%}
{%- set src_dfk = "CUSTOMER_HK"       -%}
{%- set src_sfk = "ORGANISATION_HK"         -%}
{%- set src_start_date = "START_DATE" -%}
{%- set src_end_date =   "END_DATE"     -%}
{%- set src_eff = "EFFECTIVE_FROM"    -%}
{%- set src_ldts = "LOAD_DATE"    -%}
{%- set src_source = "RECORD_SOURCE"  -%}
{{ automate_dv.eff_sat(src_pk=src_pk, src_dfk=src_dfk, src_sfk=src_sfk,
                       src_start_date=src_start_date,
                       src_end_date=src_end_date,
                       src_eff=src_eff, src_ldts=src_ldts,
                       src_source=src_source,
                       source_model=source_model) }}
  1. run dbt couple of times
  2. See result in effectivity satelite
customer_hk organisation_hk start_date end_date load_date
003dd617-c12d-444f-f9c8-0f717c3fa982 8f14e45f-ceea-167a-5a36-dedd4bea2543 2024-04-12 2024-04-19 2024-04-24 07:50:07.914253 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-19 2024-04-21 2024-04-24 07:50:07.914253 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 07:50:07.914253 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 08:02:21.036765 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-19 2024-04-21 2024-04-24 08:02:28.049750 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 08:02:37.983116 +00:00

Expected behavior
Records like below should not been added all the time when I run dbt

customer_hk organisation_hk start_date end_date load_date
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 08:02:21.036765 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-19 2024-04-21 2024-04-24 08:02:28.049750 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 08:02:37.983116 +00:00

Additional context
Not sure but I think this is because there is also same load_date for records and in final query partition by and order in this shape is not enough

latest_records AS (
    SELECT * FROM (
        SELECT b.CUSTOMER_ORGANISATION_HK, b.CUSTOMER_HK, b.ORGANISATION_HK, b.START_DATE, b.END_DATE, b.EFFECTIVE_FROM, b.LOAD_DATE, b.RECORD_SOURCE,
               ROW_NUMBER() OVER (
                    PARTITION BY b.CUSTOMER_ORGANISATION_HK
                    ORDER BY b.LOAD_DATE DESC
               ) AS row_num
        FROM "dbt_dv"."dv"."eff_sat_customer_organisation" AS b
        where customer_hk in (select customer_hk from stage.stage_organisation_customer where customer_id in (9491))
    ) AS inner_rank
    WHERE row_num = 1
)

AB#5346

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions