Skip to content

VarunSai-DVS/IoT-data-streaming-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

IoT Data Streaming Pipeline πŸš€

A fully open-source real-time data streaming platform built on AWS infrastructure using Kafka, Apache Spark, PostgreSQL, S3, Prometheus, and Grafana. This project simulates IoT sensor data and processes it through a scalable, fault-tolerant pipeline for storage, analysis, and visualization.

🌐 Project Overview

This project demonstrates how to build a real-time IoT data pipeline using open-source tools on AWS EC2 infrastructure. Key objectives:

  • Ingest sensor data using Kafka
  • Process data in near real-time using Apache Spark
  • Store raw data in S3 and transformed data in PostgreSQL
  • Monitor system performance using Prometheus and Grafana
  • Use AWS CDK (Python) for infrastructure as code

πŸ—οΈ Architecture

Sensor (AWS Lambda)
       ↓
   Kafka (EC2)
       ↓
Spark Processor (EC2)
  β†·           β†Έ
S3 (Raw)   PostgreSQL (Processed)
       ↓
 Monitoring
  (Prometheus + Grafana)

πŸ“¦ Tech Stack

Component Tech
Sensor Sim AWS Lambda (Python)
Message Queue Apache Kafka (on EC2)
Processing Apache Spark (on EC2)
Storage AWS S3, AWS RDS (Postgres)
Monitoring Prometheus, Grafana
IaC AWS CDK (Python)

🧹 Features

  • ⚑ Real-time data ingestion with Kafka
  • πŸ”₯ Stream processing using Spark
  • πŸ“‚ Dual storage: S3 for raw and Postgres for processed data
  • πŸ“ˆ Live system metrics using Prometheus and Grafana
  • πŸ› οΈ Reproducible infra setup using AWS CDK

🚧 Project Milestones

  • Architecture finalized
  • CDK project setup
  • Kafka EC2 instance
  • Spark EC2 instance
  • Lambda sensor simulator
  • PostgreSQL schema + RDS
  • Prometheus + Grafana monitoring
  • Final integration + demo

πŸ“ Folder Structure (Planned)

iot-streaming-pipeline/
β”‚
β”œβ”€β”€ cdk/                  # AWS CDK stacks
β”œβ”€β”€ lambda-simulator/     # Sensor simulation code
β”œβ”€β”€ kafka-setup/          # Kafka scripts and Dockerfiles
β”œβ”€β”€ spark-processor/      # PySpark processing logic
β”œβ”€β”€ monitoring/           # Prometheus & Grafana config
β”œβ”€β”€ infra-scripts/        # Setup scripts
β”œβ”€β”€ README.md
└── .gitignore

πŸ§ͺ Demo Use Case

Simulate temperature and humidity data from smart building sensors and process this data to:

  • Detect abnormal readings
  • Store trends in a database
  • Trigger alerts via monitoring dashboard

πŸ§‘β€πŸ’» Authors

Sai Kiran Anumalla
Varun Sai Danduri
MSCS @ Northeastern University
GitHub: @saikirananumalla
@VarunSai-DVS


πŸ“œ License

MIT License – see LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •