-
Notifications
You must be signed in to change notification settings - Fork 3
Modified the fact manifest to remove the dups #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Reviewer's GuideUpdated load_fact_manifest.sql to eliminate duplicate manifest entries by refactoring the Fact_manifest CTE to use SELECT DISTINCT and max-date subqueries instead of GROUP BY aggregates, introduced NOLOCK table hints, and removed redundant COLLATE clauses in partner/agency joins. Entity relationship diagram for deduplication in Fact_manifest CTEerDiagram
CT_FacilityManifest {
Id int
DateRecieved date
SiteCode int
Start date
End date
}
HTS_FacilityManifest {
Id int
DateArrived date
SiteCode int
Start date
End date
}
CBS_FacilityManifest {
Id int
DateArrived date
SiteCode int
Start date
End date
}
ALL_EMRSites {
MFL_Code int
emr varchar
}
CT_FacilityManifest ||--|| ALL_EMRSites : "SiteCode = MFL_Code"
HTS_FacilityManifest ||--|| ALL_EMRSites : "SiteCode = MFL_Code"
CBS_FacilityManifest ||--|| ALL_EMRSites : "SiteCode = MFL_Code"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @DennisGibz - I've reviewed your changes and they look great!
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| SDP_Agency as Agency | ||
| from ODS.Care.All_EMRSites | ||
| ), | ||
| Fact_manifest as( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DennisGibz it's a bit hard to see the changes from the diff.
What is the main change to take care of the dups ?
| ,cast(getdate() as date) as LoadDate | ||
| FROM ODS.Care.CBS_FacilityManifest(NoLock) m | ||
| INNER JOIN ( | ||
| SELECT [SiteCode] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manifest is supposed to return unduplicated records for each site on the date it uploads and not the latest record that a site uploads .
…Fact_Manifest_Dups_Alignment
…ed is the same and IDs are unique
|
@nobert-mumo / @Marymary-dev the changes are a bit clear now. The dups were caused by the ID column which is usually generated by DWAPI when inserting int the dwapicentral. |
|
@Marymary-dev I hope your concerns were addressed by the most recent adjustments to the query. |
| cbs_fm.[Id] = tm.MaxmanifestId | ||
| ) | ||
| SELECT | ||
| FactKey= IDENTITY(INT,1,1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DennisGibz a suggestion, instead on the extra changes on each of the union, could we just select the max(manifestId) & then group by the columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nobert-mumo every union has its own source of data and the group by will be duplicated across the unions.
I think its just the approach of writing the query which at the end of the day gives the same results,ama?
I know you aren't the best friend of subqueries..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DennisGibz I meant introducing a group by on this final select. You will only need one group by to eliminate the dups on manifest_id
|
@nobert-mumo above is now factor in. The query gives same results as the one for the subquery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DennisGibz LGTM
Summary by Sourcery
Refactor fact_manifest CTE to remove duplicate manifest records by selecting the latest entry per site and streamline joins.
Enhancements: