Skip to content

SahilSuman1011/rcm-data-project

Repository files navigation

Healthcare Revenue Cycle Management (RCM) – Azure Data Engineering Project

📌 Project Overview

This project implements an end-to-end data engineering pipeline on Azure to solve challenges in the Healthcare Revenue Cycle Management (RCM) domain. RCM ensures hospitals manage financial processes effectively, from patient registration to claim settlement.

The solution builds an automated Medallion Architecture (Bronze → Silver → Gold) to ingest data from multiple sources (EMR, Claims, APIs), transform it into a common data model (CDM) with Slowly Changing Dimensions (SCD2), and deliver fact/dimension tables for analytics and KPI reporting such as Accounts Receivable > 90 Days and Days in AR.


🎯 Business Objective

  • Streamline Accounts Receivable (AR) analysis to reduce outstanding collection periods.
  • Track KPIs such as:
    • AR > 90 days (aging analysis).
    • Days in AR (average collection period).
  • Enable hospitals to monitor payments, claims, and follow-ups effectively.
  • Provide stakeholders with actionable insights to improve revenue cycle efficiency.

🏗️ Solution Architecture

Architecture: Medallion Framework (Landing → Bronze → Silver → Gold)

  1. Data Sources

    • EMR (Azure SQL DB) → Patients, Providers, Transactions, Encounters, Departments
    • Claims Data (Flat Files → ADLS Gen2 Landing)
    • NPI & ICD Codes (Public APIs → ADLS Gen2 Bronze)
    • CPT Codes (Flat Files → ADLS Gen2 Landing)
  2. Data Pipeline

    • Azure Data Factory (ADF) → Orchestration & metadata-driven ingestion
    • Azure Data Lake Gen2 (ADLS) → Landing, Bronze, Silver, Gold storage
    • Azure Databricks → Transformations, CDM, SCD2 implementation
    • Delta Lake → Optimized storage for Silver & Gold layers
    • Azure Key Vault → Secure credential management
  3. Medallion Layers

    • Landing → Raw Claims/Flat Files, API dumps
    • Bronze → Source-of-truth in Parquet format
    • Silver → Data Cleaning, SCD2, Common Data Model (CDM), Quality Checks
    • Gold → Aggregated Fact & Dimension tables for BI & analytics
  4. Consumption

    • Gold layer feeds into BI/Analytics teams for KPI dashboards (e.g., AR trends, Payment delays).

⚙️ Technology Stack

  • Azure Data Factory (ADF) – Orchestration, Scheduling, Metadata-driven pipelines
  • Azure Data Lake Gen2 (ADLS) – Data storage (Landing/Bronze/Silver/Gold)
  • Azure Databricks (PySpark, Delta Lake) – Data transformation, SCD2, CDM
  • Azure SQL DB – Source EMR system
  • Azure Key Vault – Secrets & credential management
  • Unity Catalog (Databricks) – Governance & cataloging
  • Power BI (Future Scope) – KPI Reporting & visualization

📊 Key Features

  • Automated Ingestion from SQL DB, Flat Files, and APIs.
  • Audit Logging to track loads and incremental pipelines.
  • Incremental & Full Loads based on config-driven pipelines.
  • SCD2 Implementation for patient, transactions, and claims history.
  • Data Quality Checks (quarantine invalid records).
  • Parallelized Pipelines for efficiency.
  • Security & Governance via Key Vault and Unity Catalog.

📂 Folder Structure in ADLS

azureProject/
│── landing/
│── bronze/
│── silver/
│── gold/
│── configs/
    └── emr/load_config.csv

📈 Example KPIs Delivered

  • % of AR > 90 days = (AR > 90 days) / Total AR
  • Days in AR = Total AR / Avg. Daily Charges
  • Provider-level AR aging trends
  • Claim settlement rates (Insurance vs Patient)

🚀 Outcomes

  • Enabled hospitals to reduce AR aging with accurate insights.
  • Provided fact/dimension models for scalable BI reporting.
  • Automated pipelines reduced manual efforts in ETL by >60%.
  • Ensured secure, governed, and reliable data delivery for RCM analytics.

About

Azure Data Engineering Stack - Healthcare Revenue Cycle Management (RCM)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published