This production-grade data pipeline automates the end-to-end calculation of customer referral-to-win scores for partnered vendors.
It integrates multiple systems (SQL Server, Salesforce, Google Sheets) and applies robust business logic to filter, transform, and update weekly partner performance metrics.
🔥 Why This Project Matters
This solution is used to track sales conversions, identify top-performing partners, and surface operational insights — replacing manual Excel reports with a fully automated, reliable process.
I pull the data from SQL Server, process the data using Python, and then push the data to Google Sheets and the CRM system.
- Extract: Retrieves opportunity and partner data from SQL Server for a 90-day period (120 to 30 days ago).
- Transform: Processes data with Python and pandas to:
- Filter opportunities based on date, product interest, and status.
- Merge opportunity and partner data.
- Calculate customer experience scores and stage-specific metrics.
- Load: Updates Google Sheets with summary and stage-breakdown reports and pushes scores to Salesforce.
- Automation: Scheduled weekly via Airflow.
- Scalability: Modular design supports additional data sources or metrics.
- Saves Time: Before this pipeline, teams spent hours each week manually collecting data and preparing results. Now, it’s all automated.
- Reduces Errors: Manual work often had mistakes. This pipeline keeps data clean and accurate.
- Improves Decisions: Stakeholders can now view partner performance easily and take timely action to improve client outcomes and retention.
Category | Highlights |
---|---|
Data Engineering | SQL extraction, pyodbc, Airflow scheduling |
ETL & Automation | End-to-end DAG orchestration with retry logic |
Data Transformation | Pandas filtering, grouping, scoring, enrichment |
External Integrations | Google Sheets API, Salesforce Python SDK |
Cloud & Secrets | Airflow BaseHook for connection security |
Code Quality | Modular functions, clear naming, exception handling |
Real-World Use Case | Referral pipeline from CRM to executive scorecard |
graph TD;
A [SQL Server<br>Opportunity + Partner Tables] --> B[Python <br> (Pandas ETL)];
B --> C[Filtered + Enriched DataFrame];
C --> D[Score Calculation <br> + Stage Breakdown];
D --> E[Google Sheets];
D --> F[Salesforce API <br> Score Update];
A -->|Airflow DAG Trigger| B;
Property | Value |
---|---|
DAG Name | data_processing_pipeline |
Trigger | Every Tuesday at 7PM |
Retries | 1 retry (5-min delay) |
Alerts | Email on failure & retry |
Airflow Owner | data_team |
Step | Description |
---|---|
1️⃣ Extract | SQL Server queries pull referrals from last 120–30 days |
2️⃣ Filter | Only includes Care at Home / Home Care with valid referral status |
3️⃣ Merge | Opportunities merged with partner metadata |
4️⃣ Score | Calculates Closed Won % and assigns tier group |
5️⃣ Report | Uploads summary and stage-level breakdown to Google Sheets |
6️⃣ Update | Pushes calculated scores back to Salesforce via API |
The pipeline produces two datasets:
-
Summary Report (Google Sheets:
Summary
tab):partner_id name partner_state referred closed_won referral score group P001 Partner A CA 100 60 80 75.0 100 - 60 P002 Partner B NY 50 20 40 50.0 50 - 59 -
Stage Breakdown (Google Sheets:
Breakdown by Stage
tab):partner_id name partner_state referred referred_newly_funded referral_newly_funded closed_won_newly_funded score_newly_funded referred_switching referral_switching closed_won_switching score_switching P001 Partner A CA 100 60 50 45 75.0 40 30 15 50.0
Conn ID | Description |
---|---|
sql_conn |
SQL Server (pyodbc) connection |
gsheets_conn |
Google Service Account key (Sheets API) |
salesforce_conn |
Salesforce credentials (username + token) |
partner_score_sheet_id |
Google Sheet ID (stored as Airflow Variable) |
- Add connections to Airflow Admin UI (
sql_conn
,gsheets_conn
,salesforce_conn
) - Set Airflow Variable:
partner_score_sheet_id
- Place the DAG in
~/airflow/dags/
- Activate scheduler:
airflow scheduler airflow webserver
- Trigger manually or wait for schedule (Tuesday 7PM)
pip install -r requirements.txt
apache-airflow
pandas
pyodbc
simple-salesforce
google-api-python-client
google-auth
- Add error handling with Airflow notifications (e.g., Slack alerts).
- Optimize SQL queries for large datasets using indexing.
- Extend to include additional metrics (e.g., revenue, engagement).
- Integrate with BI tools (e.g., Power BI, Tableau) for real-time dashboards.
- Enhance logging with structured formats for better monitoring.
- DAG Code: Sets up the Airflow schedule and tasks.
- Operator Code: Handles the data processing and connections.