Skip to content

Conversation

@DennisGibz
Copy link
Contributor

@DennisGibz DennisGibz commented Aug 4, 2025

Summary by Sourcery

Refactor fact_manifest CTE to remove duplicate manifest records by selecting the latest entry per site and streamline joins.

Enhancements:

  • Replace GROUP BY aggregations with DISTINCT rows joined to subqueries that pick the max date per site in CT, HTS, and PKV manifests
  • Apply NOLOCK hint on manifest tables for consistent behavior
  • Remove explicit COLLATE clauses from partner and agency dimension joins

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Aug 4, 2025

Reviewer's Guide

Updated load_fact_manifest.sql to eliminate duplicate manifest entries by refactoring the Fact_manifest CTE to use SELECT DISTINCT and max-date subqueries instead of GROUP BY aggregates, introduced NOLOCK table hints, and removed redundant COLLATE clauses in partner/agency joins.

Entity relationship diagram for deduplication in Fact_manifest CTE

erDiagram
    CT_FacilityManifest {
        Id int
        DateRecieved date
        SiteCode int
        Start date
        End date
    }
    HTS_FacilityManifest {
        Id int
        DateArrived date
        SiteCode int
        Start date
        End date
    }
    CBS_FacilityManifest {
        Id int
        DateArrived date
        SiteCode int
        Start date
        End date
    }
    ALL_EMRSites {
        MFL_Code int
        emr varchar
    }
    CT_FacilityManifest ||--|| ALL_EMRSites : "SiteCode = MFL_Code"
    HTS_FacilityManifest ||--|| ALL_EMRSites : "SiteCode = MFL_Code"
    CBS_FacilityManifest ||--|| ALL_EMRSites : "SiteCode = MFL_Code"
Loading

File-Level Changes

Change Details Files
Refactored Fact_manifest CTE to deduplicate manifests
  • Replaced GROUP BY and MAX aggregates with subqueries that filter for the latest DateRecieved/DateArrived per SiteCode
  • Switched to SELECT DISTINCT for CT, HTS, and PKV manifest queries
  • Unified the query structure across all docket types
Scripts/NDWH/REPORTING RATES/load_fact_manifest.sql
Added NOLOCK hints to manifest table scans
  • Applied (NoLock) hint on ODS.Care.ct_FacilityManifest, ODS.HTS.HTS_FacilityManifest, and ODS.Care.CBS_FacilityManifest extracts
Scripts/NDWH/REPORTING RATES/load_fact_manifest.sql
Simplified partner and agency joins by removing COLLATE clauses
  • Dropped explicit Latin1_General_CI_AS collations from DimPartner and DimAgency join conditions
Scripts/NDWH/REPORTING RATES/load_fact_manifest.sql

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @DennisGibz - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

SDP_Agency as Agency
from ODS.Care.All_EMRSites
),
Fact_manifest as(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DennisGibz it's a bit hard to see the changes from the diff.
What is the main change to take care of the dups ?

,cast(getdate() as date) as LoadDate
FROM ODS.Care.CBS_FacilityManifest(NoLock) m
INNER JOIN (
SELECT [SiteCode]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manifest is supposed to return unduplicated records for each site on the date it uploads and not the latest record that a site uploads .

@DennisGibz
Copy link
Contributor Author

@nobert-mumo / @Marymary-dev the changes are a bit clear now. The dups were caused by the ID column which is usually generated by DWAPI when inserting int the dwapicentral.

@DennisGibz DennisGibz changed the title Modified the fact to remove the dups Modified the fact manifest to remove the dups Aug 13, 2025
@DennisGibz
Copy link
Contributor Author

@Marymary-dev I hope your concerns were addressed by the most recent adjustments to the query.

cbs_fm.[Id] = tm.MaxmanifestId
)
SELECT
FactKey= IDENTITY(INT,1,1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DennisGibz a suggestion, instead on the extra changes on each of the union, could we just select the max(manifestId) & then group by the columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nobert-mumo every union has its own source of data and the group by will be duplicated across the unions.

I think its just the approach of writing the query which at the end of the day gives the same results,ama?

I know you aren't the best friend of subqueries..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DennisGibz I meant introducing a group by on this final select. You will only need one group by to eliminate the dups on manifest_id

@DennisGibz
Copy link
Contributor Author

@nobert-mumo above is now factor in. The query gives same results as the one for the subquery.

Copy link
Contributor

@nobert-mumo nobert-mumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DennisGibz DennisGibz requested review from Marymary-dev and removed request for kevlanyo October 16, 2025 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants