A Java-based ETL data validator tool to ensure CSV file integrity before database insertion. Built with Maven, JUnit, and PostgreSQL. Features schema validation, duplicate detection, and batch processing.
- Schema Validation: Checks headers, data types, and required fields.
- Duplicate Detection: Flags duplicate IDs/records.
- Batch Database Writes: Efficiently inserts valid data via JDBC.
- Modular Design: Separates reading, validation, and writing logic.
- Java 8
- Maven
- JUnit 5
- PostgreSQL (JDBC)
- OpenCSV (optional)
-
Prerequisites:
- Java 8+, Maven, PostgreSQL.
- Create a table matching your CSV schema.
-
Clone the repo:
git clone https://github.com/your-username/ETL-Data-Validator.git
-
Configure database: Update
DbWriter.java
with your DB credentials:String url = "jdbc:postgresql://localhost:5432/your_db"; String user = "your_user"; String password = "your_password";
-
Run the validator:
mvn clean compile exec:java -Dexec.mainClass="com.etl.automation.ETLRunner"
resources/sample.csv
:
id,name,amount
1,Alice,1000
2,Bob,1500
3,Charlie,2000
Run JUnit tests:
mvn test
Example Jenkins pipeline (save as Jenkinsfile
):
pipeline {
agent any
stages {
stage('Build') { steps { sh 'mvn clean compile' } }
stage('Test') { steps { sh 'mvn test' } }
}
}
MIT