-
Notifications
You must be signed in to change notification settings - Fork 144
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When for two last records in stage for one src_dfk has not been changed src_sfk and between previous and current data processing for this particular src_dfk nothing changed eff_sat always insert one additional record to table.
Environment
dbt version: 1.7.8
automate_dv version: 0.10.2
Database/Platform: postgres
To Reproduce
Steps to reproduce the behavior:
- Insert to stage like below, I have deleted *hk columns to not do the mess here
updated_date | organisation_id | customer_id | start_date | end_date | effective_from | load_date |
---|---|---|---|---|---|---|
2024-04-12 | 7 | 9491 | 2024-04-12 | 2024-04-19 | 2024-04-12 | 2024-04-24 08:18:21.882583 +00:00 |
2024-04-19 | 5 | 9491 | 2024-04-19 | 2024-04-21 | 2024-04-19 | 2024-04-24 08:18:21.882583 +00:00 |
2024-04-21 | 5 | 9491 | 2024-04-21 | 9999-12-31 | 2024-04-21 | 2024-04-24 08:18:21.882583 +00:00 |
- create effectivity satelite
{{ config(materialized='incremental') }}
{%- set source_model = "stage_organisation_customer" -%}
{%- set src_pk = "CUSTOMER_ORGANISATION_HK" -%}
{%- set src_dfk = "CUSTOMER_HK" -%}
{%- set src_sfk = "ORGANISATION_HK" -%}
{%- set src_start_date = "START_DATE" -%}
{%- set src_end_date = "END_DATE" -%}
{%- set src_eff = "EFFECTIVE_FROM" -%}
{%- set src_ldts = "LOAD_DATE" -%}
{%- set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.eff_sat(src_pk=src_pk, src_dfk=src_dfk, src_sfk=src_sfk,
src_start_date=src_start_date,
src_end_date=src_end_date,
src_eff=src_eff, src_ldts=src_ldts,
src_source=src_source,
source_model=source_model) }}
- run dbt couple of times
- See result in effectivity satelite
customer_hk | organisation_hk | start_date | end_date | load_date |
---|---|---|---|---|
003dd617-c12d-444f-f9c8-0f717c3fa982 | 8f14e45f-ceea-167a-5a36-dedd4bea2543 | 2024-04-12 | 2024-04-19 | 2024-04-24 07:50:07.914253 +00:00 |
003dd617-c12d-444f-f9c8-0f717c3fa982 | e4da3b7f-bbce-2345-d777-2b0674a318d5 | 2024-04-19 | 2024-04-21 | 2024-04-24 07:50:07.914253 +00:00 |
003dd617-c12d-444f-f9c8-0f717c3fa982 | e4da3b7f-bbce-2345-d777-2b0674a318d5 | 2024-04-21 | 9999-12-31 | 2024-04-24 07:50:07.914253 +00:00 |
003dd617-c12d-444f-f9c8-0f717c3fa982 | e4da3b7f-bbce-2345-d777-2b0674a318d5 | 2024-04-21 | 9999-12-31 | 2024-04-24 08:02:21.036765 +00:00 |
003dd617-c12d-444f-f9c8-0f717c3fa982 | e4da3b7f-bbce-2345-d777-2b0674a318d5 | 2024-04-19 | 2024-04-21 | 2024-04-24 08:02:28.049750 +00:00 |
003dd617-c12d-444f-f9c8-0f717c3fa982 | e4da3b7f-bbce-2345-d777-2b0674a318d5 | 2024-04-21 | 9999-12-31 | 2024-04-24 08:02:37.983116 +00:00 |
Expected behavior
Records like below should not been added all the time when I run dbt
customer_hk | organisation_hk | start_date | end_date | load_date |
---|---|---|---|---|
003dd617-c12d-444f-f9c8-0f717c3fa982 | e4da3b7f-bbce-2345-d777-2b0674a318d5 | 2024-04-21 | 9999-12-31 | 2024-04-24 08:02:21.036765 +00:00 |
003dd617-c12d-444f-f9c8-0f717c3fa982 | e4da3b7f-bbce-2345-d777-2b0674a318d5 | 2024-04-19 | 2024-04-21 | 2024-04-24 08:02:28.049750 +00:00 |
003dd617-c12d-444f-f9c8-0f717c3fa982 | e4da3b7f-bbce-2345-d777-2b0674a318d5 | 2024-04-21 | 9999-12-31 | 2024-04-24 08:02:37.983116 +00:00 |
Additional context
Not sure but I think this is because there is also same load_date for records and in final query partition by and order in this shape is not enough
latest_records AS (
SELECT * FROM (
SELECT b.CUSTOMER_ORGANISATION_HK, b.CUSTOMER_HK, b.ORGANISATION_HK, b.START_DATE, b.END_DATE, b.EFFECTIVE_FROM, b.LOAD_DATE, b.RECORD_SOURCE,
ROW_NUMBER() OVER (
PARTITION BY b.CUSTOMER_ORGANISATION_HK
ORDER BY b.LOAD_DATE DESC
) AS row_num
FROM "dbt_dv"."dv"."eff_sat_customer_organisation" AS b
where customer_hk in (select customer_hk from stage.stage_organisation_customer where customer_id in (9491))
) AS inner_rank
WHERE row_num = 1
)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working