Mini Project 2: Containerized ETL Data Pipeline

This project demonstrates a fundamental ETL (Extract, Transform, Load) data pipeline pattern using Python and a PostgreSQL database, all running as isolated services orchestrated by Docker Compose. The key focus is on managing a stateful service (the database) and ensuring data persistence across container lifecycles using Docker Volumes.

Core Concepts Demonstrated

Stateful Services in Docker: Successfully managed a PostgreSQL database, a stateful service where data persistence is critical.
Data Persistence with Docker Volumes: Utilized a named Docker Volume (db-data) to store the PostgreSQL data on the host machine, ensuring that data survives container restarts and re-creations.
Containerized Microservices: Built a multi-container application simulating a real-world pipeline with distinct services: a data producer, a database, and a data consumer.
Service-to-Service Networking: Established reliable communication between Python application containers and the database container using Docker's internal networking and service names.
Resilient Application Design: Implemented a connection retry loop in the Python scripts, making them robust against timing issues where an application might start before the database is fully initialized.
Dependency Management: Managed Python library dependencies (psycopg2, Faker) using a requirements.txt file for reproducible builds.
Idempotent Database Setup: Used the CREATE TABLE IF NOT EXISTS SQL command to ensure the database schema setup can be run multiple times without causing errors.

Technologies Used

Containerization: Docker, Docker Compose
Database: PostgreSQL (Official Docker Image)
Programming Language: Python 3
Key Python Libraries: psycopg2-binary (for PostgreSQL connection), Faker (for test data generation)

How to Run the Pipeline

Prerequisites: Docker and Docker Compose must be installed.

Clone the repository:

git clone https://github.com/YogeshT22/local-docker-data-pipeline
cd local-docker-data-pipeline

Run the entire pipeline in detached mode: This command will build the Python application image and start all three services (producer, consumer, and database) in the correct order.
```
docker-compose up --build -d
```
Wait for the pipeline to execute: Allow about 20-30 seconds for the producer and consumer scripts to complete their work in the background.
Check the consumer's output: To see the final report generated by the consumer.py script, view its logs:
```
docker-compose logs consumer-app
```
On the first run, you should see a report of 5 users found in the database.
Test Data Persistence: Run the up command again to simulate a second data batch. The volume will persist the old data.
```
docker-compose up -d
```
Now, check the consumer logs again after waiting a few seconds. You should see a report of 10 users (the original 5 plus 5 new ones), proving that the data was successfully persisted in the Docker Volume.
Clean Up: To stop and remove the containers, network, and persisted data volume, run:
```
docker-compose down -v
```

Architecture Diagram

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
consumer.py		consumer.py
docker-compose.yml		docker-compose.yml
producer.py		producer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mini Project 2: Containerized ETL Data Pipeline

Core Concepts Demonstrated

Technologies Used

How to Run the Pipeline

Architecture Diagram

License

About

Uh oh!

Releases

Packages

Languages

License

YogeshT22/local-docker-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Mini Project 2: Containerized ETL Data Pipeline

Core Concepts Demonstrated

Technologies Used

How to Run the Pipeline

Architecture Diagram

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages