Skip to content

Conversation

@DennisGibz
Copy link
Contributor

@DennisGibz DennisGibz commented Aug 5, 2025

… rollback trans on the script

Summary by Sourcery

Refine the load_FactDefaulterTracing.sql script by removing manual transaction control and adding CTE-based deduplication for differentiated care and defaulter tracing data before populating the fact table.

Enhancements:

  • Remove explicit BEGIN TRAN and ROLLBACK TRAN statements and add a leading semicolon before the initial CTE.
  • Introduce windowed CTEs to select the most recent differentiated care records per patient using ROW_NUMBER().
  • Introduce windowed CTEs to select the most recent defaulter tracing records per patient using ROW_NUMBER().
  • Update the visits_data CTE to join against the deduplicated CTEs and apply SELECT DISTINCT for the final dataset.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Aug 5, 2025

Reviewer's Guide

This PR simplifies the FactDefaulterTracing load script by removing explicit transaction boundaries, adding CTEs with row_number-based logic to deduplicate both differentiated care and defaulter tracing records, and updating the main visits_data CTE to consume these deduplicated sources and adjust field selections.

Entity relationship diagram for deduplication logic in FactDefaulterTracing load

erDiagram
    ODS_Care_CT_PatientVisits {
        int SiteCode
        int PatientPK
        string PatientPKHash
        string DifferentiatedCare
        date VisitDate
        int VisitID
    }
    ODS_Intermediate_Intermediate_LastVisitDate {
        int SiteCode
        int PatientPK
        date LastVisitDate
    }
    ODS_Care_CT_DefaulterTracing {
        int SiteCode
        int PatientPK
        string PatientPKHash
        string PatientIDHash
        date VisitDate
        int VisitID
        string Comments
        string TracingType
        string TracingOutcome
        string IsFinalTrace
    }
    ODS_Care_CT_Patient {
        string PatientPKHash
        int SiteCode
        date DOB
    }
    Latest_Differentiated_Rank {
        int SiteCode
        int PatientPK
        string PatientPKHash
        string DifferentiatedCare
        date VisitDate
        int VisitID
    }
    Latest_Latest_DefaulterTracing_Ordered_Rank {
        int SiteCode
        int PatientPK
        string PatientPKHash
        string PatientIDHash
        date VisitDate
        int VisitID
        string Comments
        string TracingType
        string TracingOutcome
        string IsFinalTrace
    }
    visits_data {
        string PatientPKHash
        string PatientIDHash
        int SiteCode
        int VisitID
        date VisitDate
        string TracingType
        string TracingOutcome
        string IsFinalTrace
        string Comments
        string DifferentiatedCare
        int AgeAtVisit
    }

    ODS_Care_CT_PatientVisits ||--o{ ODS_Intermediate_Intermediate_LastVisitDate : "joins on SiteCode, PatientPK, VisitDate"
    ODS_Care_CT_PatientVisits ||--o{ Latest_Differentiated_Rank : "deduped by PatientPK, SiteCode, VisitID"
    ODS_Care_CT_DefaulterTracing ||--o{ Latest_Latest_DefaulterTracing_Ordered_Rank : "deduped by PatientPK, SiteCode, VisitID"
    Latest_Latest_DefaulterTracing_Ordered_Rank ||--o{ visits_data : "source for defaulter tracing"
    Latest_Differentiated_Rank ||--o{ visits_data : "source for differentiated care"
    ODS_Care_CT_Patient ||--o{ visits_data : "join on PatientPKHash, SiteCode"
Loading

Class diagram for new deduplication CTEs in FactDefaulterTracing load

classDiagram
    class Latest_Differentiated_Ordered {
        +int num
        +int SiteCode
        +date VisitDate
        +int VisitID
        +string PatientPKHash
        +string DifferentiatedCare
    }
    class Latest_Differentiated_Rank {
        +int SiteCode
        +date VisitDate
        +int VisitID
        +string PatientPKHash
        +string DifferentiatedCare
    }
    class Latest_DefaulterTracing_Ordered {
        +int num
        +int SiteCode
        +int PatientPK
        +string PatientPKHash
        +string PatientIDHash
        +date VisitDate
        +int VisitID
        +string Comments
        +string TracingType
        +string TracingOutcome
        +string IsFinalTrace
    }
    class Latest_Latest_DefaulterTracing_Ordered_Rank {
        +int SiteCode
        +int PatientPK
        +string PatientPKHash
        +string PatientIDHash
        +date VisitDate
        +int VisitID
        +string Comments
        +string TracingType
        +string TracingOutcome
        +string IsFinalTrace
    }
    Latest_Differentiated_Ordered <|-- Latest_Differentiated_Rank : filter num=1
    Latest_DefaulterTracing_Ordered <|-- Latest_Latest_DefaulterTracing_Ordered_Rank : filter num=1
Loading

File-Level Changes

Change Details Files
Removed explicit transaction wrappers
  • Removed begin tran
  • Removed rollback tran
Scripts/NDWH/C&T FACT TABLES/load_FactDefaulterTracing.sql
Introduced CTE-based deduplication for differentiated care
  • Added Latest_Differentiated_Ordered CTE with row_number partitioned by SiteCode and PatientPK
  • Added Latest_Differentiated_Rank CTE filtering for num = 1
Scripts/NDWH/C&T FACT TABLES/load_FactDefaulterTracing.sql
Introduced CTE-based deduplication for defaulter tracing
  • Added Latest_DefaulterTracing_Ordered CTE with row_number partitioned by SiteCode and PatientPK
  • Added Latest_Latest_DefaulterTracing_Ordered_Rank CTE filtering for num = 1
Scripts/NDWH/C&T FACT TABLES/load_FactDefaulterTracing.sql
Updated visits_data CTE to use deduplicated sources and adjust fields
  • Switched data source from raw CT_DefaulterTracing to the deduplicated defaulter tracing CTE
  • Joined on Latest_Differentiated_Rank instead of intermediate raw CTE
  • Added SELECT DISTINCT to remove residual duplicates
  • Removed explicit collate usage and referenced new CTE columns
Scripts/NDWH/C&T FACT TABLES/load_FactDefaulterTracing.sql
Adjusted CTE syntax and formatting
  • Prefixed CTE chain with a leading semicolon
  • Cleaned up join formatting and whitespace
Scripts/NDWH/C&T FACT TABLES/load_FactDefaulterTracing.sql

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @DennisGibz - I've reviewed your changes - here's some feedback:

  • Remove the unnecessary distinct * in the ranking CTEs (row_number already deduplicates) and explicitly list columns to guard against future schema changes.
  • Since the begin tran/rollback tran were removed, consider wrapping the final insert in a transaction with TRY/CATCH so failures won’t leave the table in a partial state.
  • Simplify and standardize the verbose CTE names (e.g., rename Latest_Latest_DefaulterTracing_Ordered_Rank to something shorter) for better readability and maintainability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Remove the unnecessary `distinct *` in the ranking CTEs (row_number already deduplicates) and explicitly list columns to guard against future schema changes.
- Since the `begin tran`/`rollback tran` were removed, consider wrapping the final insert in a transaction with TRY/CATCH so failures won’t leave the table in a partial state.
- Simplify and standardize the verbose CTE names (e.g., rename `Latest_Latest_DefaulterTracing_Ordered_Rank` to something shorter) for better readability and maintainability.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

DROP TABLE [NDWH].[Fact].[FactDefaulterTracing];

with MFL_partner_agency_combination as (
;with MFL_partner_agency_combination as (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably misplaced semi colon ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nobert-mumo the semicolon just works fine.

,visits.VisitDate
,visits.visitID
from ODS.Care.CT_PatientVisits as visits
inner join ODS.[Intermediate].Intermediate_LastVisitDate as last_visit on visits.SiteCode = last_visit.SiteCode
Copy link
Contributor

@nobert-mumo nobert-mumo Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DennisGibz BTW what does the VisitID imply for this intermediate? Should we do the partitioning instead on this table so that it is consistent for all other Facts ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nobert-mumo I didn't get above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DennisGibz I meant, my assumption is the addition of the visitID here is to make the rows unique, if that is the case, does it mean all other Facts that utilise Intermediate_LastVisitDate could also suffer the same issues as this fact ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Intermediate_LastVisitDate we don't use visitID on the partition. Actually I don't think we have any table that we use visitID as the unique Identifier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants