Skip to content

scomorgan/strategic-sourcing-data-enrichment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Matching - AIC File Project

This project aims to enhance data quality, automate workflows, and improve accessibility for multiple projects, including Suplari Implementation, P Card, and Travel Analytics. The primary objective is to develop a robust, automated matching framework for AIC files that ensures consistent data across platforms, streamlining workflows and enabling accurate reporting and analysis.

Key Links and Resources

EDP Access and Registration

EDP Integrations and Connections

EDP Support and Communication

Finance and Data Platform Resources

Access Requests and Permissions

Project Structure

  • docs/ - Documentation files including data dictionary, architecture, and analysis notes.
  • config/ - Configuration files for database connections, environment settings, and mapping schemas.
  • sql/ - SQL scripts categorized into raw data extraction, transformations, aggregations, and matching logic.
  • etl/ - ETL pipeline code with modular functions for data normalization, matching, enrichment, and validation.
  • scripts/ - Utility scripts for validating matches, checking data quality, and generating reports.
  • data_samples/ - Sample data files (sanitized) for testing ETL processes, including raw and processed data.

Data Tables and Columns

The following tables and columns are key to the Matching - AIC File project:

Table Name Key Columns
GL_DAILY_RATESConversion Date, From Currency, To Currency, Conversion Rate
HR_OPERATING_UNITSOrganization ID, Name, Location
HZ_CUST_ACCOUNTSAccount Number, Customer ID, Customer Name

Getting Started

  1. Clone the repository.
  2. Install required packages by running pip install -r requirements.txt.
  3. Set up database connections by configuring db_connections.yaml in the config/ folder.
  4. Prepare sample data in the data_samples/ folder for initial testing.

Running the ETL Pipeline

The main ETL pipeline is located in etl/main_etl_pipeline.py. You can run the entire ETL process or execute individual modules as needed:

python etl/main_etl_pipeline.py

Scripts

  • validate_matches.py - Checks matching accuracy and provides metrics on match rates.
  • check_data_quality.py - Runs data quality checks to identify inconsistencies.
  • generate_reports.py - Generates reports on the matching process and progress.

Key Documentation

  • data_dictionary.md - Detailed documentation of tables, fields, and relationships.
  • architecture.md - High-level overview of data architecture and system integration.
  • troubleshooting.md - Common issues and solutions for data and ETL challenges.

Contributing

For contributions, please adhere to the following guidelines:

  1. Use branches for specific features or modules, and provide descriptive names (e.g., feature/fuzzy_matching).
  2. Commit regularly with clear messages describing changes made.
  3. Document any new functions or scripts added to the repository.

Contact

For any questions or further assistance, please reach out to the project lead, Scott Morgan.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published