Weather Data ETL Using Kafka

This project simulates a real-time weather data streaming pipeline using Apache Kafka, designed to emulate IoT sensors that continuously transmit weather metrics such as temperature, humidity, and pressure. The architecture is built for scalability, fault tolerance, and parallelism using a multi-node Kafka setup and country-wise topic segmentation. The streamed data is ingested and stored in a PostgreSQL database, forming the foundation for downstream analytics, visualization, or machine learning applications.

Project Overview

Producers: Python-based Kafka producers read from a CSV (simulating real-time sensor data) and send randomized weather updates to country-specific Kafka topics. Each record includes attributes such as temperature, humidity, pressure, timestamp, and country.
Kafka Cluster: A multi-node Kafka setup (configured via Docker Compose) ensures scalable, high-throughput, and fault-tolerant message streaming. Kafka topics are logically divided by country, enabling parallel data pipelines for different regions.
Consumers: Kafka consumers are implemented in Python to subscribe to individual country topics, parse weather data in real time, and persist the data into a PostgreSQL database. Each consumer handles only one country, allowing easy horizontal scaling.
PostgreSQL Data Store: The consumers write structured weather data into country-specific tables in PostgreSQL. This schema is optimized for analytical querying and future integration with BI tools.
ETL Pipeline: Effectively acts as an end-to-end ETL pipeline—from simulated sensor data (Extract), real-time parsing and filtering (Transform), to structured storage in PostgreSQL (Load).

Flowcharts

Technologies Used

Apache Kafka: Real-time distributed streaming platform
Docker & Docker Compose: Containerized deployment of Kafka (multi-node), Zookeeper, and PostgreSQL
Python: Kafka Producers and Consumers (using kafka-python)
PostgreSQL: Relational database to store ingested weather data
SQL: Table creation and query scripts for storage and analysis

Key Features

Simulated real-time ingestion from weather sensors
Country-wise topic partitioning for organized streaming
Multi-node Kafka deployment for reliability and scalability
PostgreSQL-based persistent storage with country-level schema separation
Modular ETL architecture suitable for real-time analytics or downstream ML pipelines

Project Structure

Weather-Data-ETL-using-Kafka/
│
├── data/
│   └── weather.csv                # Simulated IoT sensor weather data
│
├── server1.yml                    # Docker Compose file for Kafka Node 1
├── server2.yml                    # Docker Compose file for Kafka Node 2
├── server3.yml                    # Docker Compose file for Kafka Node 3
│
├── weather_producer.py           # Python script to simulate weather data producer
├── weather_consumer1.py          # Kafka consumer for selected countries
├── weather_consumer2.py          # Additional Kafka consumer for scaling
│
├── create_postgres_schema.py     # Python script to create PostgreSQL tables
│
├── requirements.txt              # Python dependencies for producer and consumer
├── docker commands.txt           # Useful Docker CLI commands and references
│
├── .gitignore                    # Files/folders to exclude from version control
├── README.md                     # Project documentation
└── .DS_Store                     # (System file, safe to ignore or delete)

How It Works

Start Kafka & Postgres using Docker Compose:
```
docker-compose up -d
```
Run the producer to publish data to Kafka topics (based on country):
```
python kafka_producer.py
```
Run the consumer to read from Kafka and store data into PostgreSQL:
```
python kafka_consumer.py
```

Database Schema

PostgreSQL table example:

{'MinTemp': 'FLOAT', 'MaxTemp': 'FLOAT', 'Rainfall': 'FLOAT', 'Evaporation': 'FLOAT', 'Sunshine': 'FLOAT', 'WindGustDir': 'TEXT', 'WindGustSpeed': 'FLOAT', 'WindDir9am': 'TEXT', 'WindDir3pm': 'TEXT', 'WindSpeed9am': 'FLOAT', 'WindSpeed3pm': 'INTEGER', 'Humidity9am': 'INTEGER', 'Humidity3pm': 'INTEGER', 'Pressure9am': 'FLOAT', 'Pressure3pm': 'FLOAT', 'Cloud9am': 'INTEGER', 'Cloud3pm': 'INTEGER', 'Temp9am': 'FLOAT', 'Temp3pm': 'FLOAT', 'RainToday': 'TEXT', 'RISK_MM': 'FLOAT', 'RainTomorrow': 'TEXT', 'City': 'TEXT', 'Country': 'TEXT', 'Coordinates': 'TEXT'}

Use Case

This project mimics a real-world IoT + big data ETL setup for scenarios like:

Agriculture advisory system: Farmers get alerts based on humidity and rainfall.
Smart city traffic/weather routing: Integrate with traffic simulation data for route planning.
Disaster response simulation: Create a reactive system for storm or flood warnings.

Installation

Clone the repository:

git clone https://github.com/Data-Projects-AGN/Weather-Data-ETL-using-Kafka.git
cd Weather-Data-ETL-using-Kafka

Install dependencies:
```
pip install -r requirements.txt
```

Contact

For any questions, feel free to open an issue or reach out via the repository.

Abhishek Prasanna Walavalkar (abhishek.walavalkar13@gmail.com) (Linkedin)

Gandhar Ravindra Pansare (gandharpansare@gmail.com) (Linkedin)

Venkat Nikhil Chimata (venkatnikhil1018@gmail.com) (Linkedin)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Weather Data ETL Using Kafka

Project Overview

Flowcharts

Technologies Used

Key Features

Project Structure

How It Works

Database Schema

Use Case

Installation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
create_postgres_schema.py		create_postgres_schema.py
docker commands.txt		docker commands.txt
requirements.txt		requirements.txt
server1.yml		server1.yml
server2.yml		server2.yml
server3.yml		server3.yml
weather_consumer1.py		weather_consumer1.py
weather_consumer2.py		weather_consumer2.py
weather_producer.py		weather_producer.py

Data-Projects-AGN/Weather-Data-ETL-using-Kafka

Folders and files

Latest commit

History

Repository files navigation

Weather Data ETL Using Kafka

Project Overview

Flowcharts

Technologies Used

Key Features

Project Structure

How It Works

Database Schema

Use Case

Installation

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages