-
Notifications
You must be signed in to change notification settings - Fork 3
Worked on removing dups on the fact as well as removing the begin and… #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
… rollback trans on the script
Reviewer's GuideThis PR simplifies the FactDefaulterTracing load script by removing explicit transaction boundaries, adding CTEs with row_number-based logic to deduplicate both differentiated care and defaulter tracing records, and updating the main visits_data CTE to consume these deduplicated sources and adjust field selections. Entity relationship diagram for deduplication logic in FactDefaulterTracing loaderDiagram
ODS_Care_CT_PatientVisits {
int SiteCode
int PatientPK
string PatientPKHash
string DifferentiatedCare
date VisitDate
int VisitID
}
ODS_Intermediate_Intermediate_LastVisitDate {
int SiteCode
int PatientPK
date LastVisitDate
}
ODS_Care_CT_DefaulterTracing {
int SiteCode
int PatientPK
string PatientPKHash
string PatientIDHash
date VisitDate
int VisitID
string Comments
string TracingType
string TracingOutcome
string IsFinalTrace
}
ODS_Care_CT_Patient {
string PatientPKHash
int SiteCode
date DOB
}
Latest_Differentiated_Rank {
int SiteCode
int PatientPK
string PatientPKHash
string DifferentiatedCare
date VisitDate
int VisitID
}
Latest_Latest_DefaulterTracing_Ordered_Rank {
int SiteCode
int PatientPK
string PatientPKHash
string PatientIDHash
date VisitDate
int VisitID
string Comments
string TracingType
string TracingOutcome
string IsFinalTrace
}
visits_data {
string PatientPKHash
string PatientIDHash
int SiteCode
int VisitID
date VisitDate
string TracingType
string TracingOutcome
string IsFinalTrace
string Comments
string DifferentiatedCare
int AgeAtVisit
}
ODS_Care_CT_PatientVisits ||--o{ ODS_Intermediate_Intermediate_LastVisitDate : "joins on SiteCode, PatientPK, VisitDate"
ODS_Care_CT_PatientVisits ||--o{ Latest_Differentiated_Rank : "deduped by PatientPK, SiteCode, VisitID"
ODS_Care_CT_DefaulterTracing ||--o{ Latest_Latest_DefaulterTracing_Ordered_Rank : "deduped by PatientPK, SiteCode, VisitID"
Latest_Latest_DefaulterTracing_Ordered_Rank ||--o{ visits_data : "source for defaulter tracing"
Latest_Differentiated_Rank ||--o{ visits_data : "source for differentiated care"
ODS_Care_CT_Patient ||--o{ visits_data : "join on PatientPKHash, SiteCode"
Class diagram for new deduplication CTEs in FactDefaulterTracing loadclassDiagram
class Latest_Differentiated_Ordered {
+int num
+int SiteCode
+date VisitDate
+int VisitID
+string PatientPKHash
+string DifferentiatedCare
}
class Latest_Differentiated_Rank {
+int SiteCode
+date VisitDate
+int VisitID
+string PatientPKHash
+string DifferentiatedCare
}
class Latest_DefaulterTracing_Ordered {
+int num
+int SiteCode
+int PatientPK
+string PatientPKHash
+string PatientIDHash
+date VisitDate
+int VisitID
+string Comments
+string TracingType
+string TracingOutcome
+string IsFinalTrace
}
class Latest_Latest_DefaulterTracing_Ordered_Rank {
+int SiteCode
+int PatientPK
+string PatientPKHash
+string PatientIDHash
+date VisitDate
+int VisitID
+string Comments
+string TracingType
+string TracingOutcome
+string IsFinalTrace
}
Latest_Differentiated_Ordered <|-- Latest_Differentiated_Rank : filter num=1
Latest_DefaulterTracing_Ordered <|-- Latest_Latest_DefaulterTracing_Ordered_Rank : filter num=1
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @DennisGibz - I've reviewed your changes - here's some feedback:
- Remove the unnecessary
distinct *in the ranking CTEs (row_number already deduplicates) and explicitly list columns to guard against future schema changes. - Since the
begin tran/rollback tranwere removed, consider wrapping the final insert in a transaction with TRY/CATCH so failures won’t leave the table in a partial state. - Simplify and standardize the verbose CTE names (e.g., rename
Latest_Latest_DefaulterTracing_Ordered_Rankto something shorter) for better readability and maintainability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Remove the unnecessary `distinct *` in the ranking CTEs (row_number already deduplicates) and explicitly list columns to guard against future schema changes.
- Since the `begin tran`/`rollback tran` were removed, consider wrapping the final insert in a transaction with TRY/CATCH so failures won’t leave the table in a partial state.
- Simplify and standardize the verbose CTE names (e.g., rename `Latest_Latest_DefaulterTracing_Ordered_Rank` to something shorter) for better readability and maintainability.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| DROP TABLE [NDWH].[Fact].[FactDefaulterTracing]; | ||
|
|
||
| with MFL_partner_agency_combination as ( | ||
| ;with MFL_partner_agency_combination as ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably misplaced semi colon ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nobert-mumo the semicolon just works fine.
| ,visits.VisitDate | ||
| ,visits.visitID | ||
| from ODS.Care.CT_PatientVisits as visits | ||
| inner join ODS.[Intermediate].Intermediate_LastVisitDate as last_visit on visits.SiteCode = last_visit.SiteCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DennisGibz BTW what does the VisitID imply for this intermediate? Should we do the partitioning instead on this table so that it is consistent for all other Facts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nobert-mumo I didn't get above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DennisGibz I meant, my assumption is the addition of the visitID here is to make the rows unique, if that is the case, does it mean all other Facts that utilise Intermediate_LastVisitDate could also suffer the same issues as this fact ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

… rollback trans on the script
Summary by Sourcery
Refine the load_FactDefaulterTracing.sql script by removing manual transaction control and adding CTE-based deduplication for differentiated care and defaulter tracing data before populating the fact table.
Enhancements: