IoT Data Streaming Pipeline 🚀

A fully open-source real-time data streaming platform built on AWS infrastructure using Kafka, Apache Spark, PostgreSQL, S3, Prometheus, and Grafana. This project simulates IoT sensor data and processes it through a scalable, fault-tolerant pipeline for storage, analysis, and visualization.

🌐 Project Overview

This project demonstrates how to build a real-time IoT data pipeline using open-source tools on AWS EC2 infrastructure. Key objectives:

Ingest sensor data using Kafka
Process data in near real-time using Apache Spark
Store raw data in S3 and transformed data in PostgreSQL
Monitor system performance using Prometheus and Grafana
Use AWS CDK (Python) for infrastructure as code

🏗️ Architecture

Sensor (AWS Lambda)
       ↓
   Kafka (EC2)
       ↓
Spark Processor (EC2)
  ↷           ↸
S3 (Raw)   PostgreSQL (Processed)
       ↓
 Monitoring
  (Prometheus + Grafana)

📦 Tech Stack

Component	Tech
Sensor Sim	AWS Lambda (Python)
Message Queue	Apache Kafka (on EC2)
Processing	Apache Spark (on EC2)
Storage	AWS S3, AWS RDS (Postgres)
Monitoring	Prometheus, Grafana
IaC	AWS CDK (Python)

🧹 Features

⚡ Real-time data ingestion with Kafka
🔥 Stream processing using Spark
📂 Dual storage: S3 for raw and Postgres for processed data
📈 Live system metrics using Prometheus and Grafana
🛠️ Reproducible infra setup using AWS CDK

🚧 Project Milestones

📁 Folder Structure (Planned)

iot-streaming-pipeline/
│
├── cdk/                  # AWS CDK stacks
├── lambda-simulator/     # Sensor simulation code
├── kafka-setup/          # Kafka scripts and Dockerfiles
├── spark-processor/      # PySpark processing logic
├── monitoring/           # Prometheus & Grafana config
├── infra-scripts/        # Setup scripts
├── README.md
└── .gitignore

🧪 Demo Use Case

Simulate temperature and humidity data from smart building sensors and process this data to:

Detect abnormal readings
Store trends in a database
Trigger alerts via monitoring dashboard

🧑‍💻 Authors

Sai Kiran Anumalla
Varun Sai Danduri
MSCS @ Northeastern University
GitHub: @saikirananumalla
@VarunSai-DVS

📜 License

MIT License – see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
IoT_Project_Tracker_Templated.xlsx		IoT_Project_Tracker_Templated.xlsx
README.md		README.md
iteration-2-submission.pdf		iteration-2-submission.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IoT Data Streaming Pipeline 🚀

🌐 Project Overview

🏗️ Architecture

📦 Tech Stack

🧹 Features

🚧 Project Milestones

📁 Folder Structure (Planned)

🧪 Demo Use Case

🧑‍💻 Authors

📜 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

VarunSai-DVS/IoT-data-streaming-pipeline

Folders and files

Latest commit

History

Repository files navigation

IoT Data Streaming Pipeline 🚀

🌐 Project Overview

🏗️ Architecture

📦 Tech Stack

🧹 Features

🚧 Project Milestones

📁 Folder Structure (Planned)

🧪 Demo Use Case

🧑‍💻 Authors

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages