Skip to content

A Java-based ETL data validator tool to ensure CSV file integrity before database insertion. Built with Maven, JUnit, and PostgreSQL. Features schema validation, duplicate detection, and batch processing.

Notifications You must be signed in to change notification settings

Psb-bit/ETL-Data-Validator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ETL Data Validator (Java)

A Java-based ETL data validator tool to ensure CSV file integrity before database insertion. Built with Maven, JUnit, and PostgreSQL. Features schema validation, duplicate detection, and batch processing.

Features

  • Schema Validation: Checks headers, data types, and required fields.
  • Duplicate Detection: Flags duplicate IDs/records.
  • Batch Database Writes: Efficiently inserts valid data via JDBC.
  • Modular Design: Separates reading, validation, and writing logic.

Tech Stack

  • Java 8
  • Maven
  • JUnit 5
  • PostgreSQL (JDBC)
  • OpenCSV (optional)

How to Run

  1. Prerequisites:

    • Java 8+, Maven, PostgreSQL.
    • Create a table matching your CSV schema.
  2. Clone the repo:

    git clone https://github.com/your-username/ETL-Data-Validator.git
  3. Configure database: Update DbWriter.java with your DB credentials:

    String url = "jdbc:postgresql://localhost:5432/your_db";
    String user = "your_user";
    String password = "your_password";
  4. Run the validator:

    mvn clean compile exec:java -Dexec.mainClass="com.etl.automation.ETLRunner"

Sample Input

resources/sample.csv:

id,name,amount
1,Alice,1000
2,Bob,1500
3,Charlie,2000

Testing

Run JUnit tests:

mvn test

CI/CD

Example Jenkins pipeline (save as Jenkinsfile):

pipeline {
    agent any
    stages {
        stage('Build') { steps { sh 'mvn clean compile' } }
        stage('Test') { steps { sh 'mvn test' } }
    }
}

License

MIT

About

A Java-based ETL data validator tool to ensure CSV file integrity before database insertion. Built with Maven, JUnit, and PostgreSQL. Features schema validation, duplicate detection, and batch processing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages