A fully open-source real-time data streaming platform built on AWS infrastructure using Kafka, Apache Spark, PostgreSQL, S3, Prometheus, and Grafana. This project simulates IoT sensor data and processes it through a scalable, fault-tolerant pipeline for storage, analysis, and visualization.
This project demonstrates how to build a real-time IoT data pipeline using open-source tools on AWS EC2 infrastructure. Key objectives:
- Ingest sensor data using Kafka
- Process data in near real-time using Apache Spark
- Store raw data in S3 and transformed data in PostgreSQL
- Monitor system performance using Prometheus and Grafana
- Use AWS CDK (Python) for infrastructure as code
Sensor (AWS Lambda)
β
Kafka (EC2)
β
Spark Processor (EC2)
β· βΈ
S3 (Raw) PostgreSQL (Processed)
β
Monitoring
(Prometheus + Grafana)
Component | Tech |
---|---|
Sensor Sim | AWS Lambda (Python) |
Message Queue | Apache Kafka (on EC2) |
Processing | Apache Spark (on EC2) |
Storage | AWS S3, AWS RDS (Postgres) |
Monitoring | Prometheus, Grafana |
IaC | AWS CDK (Python) |
- β‘ Real-time data ingestion with Kafka
- π₯ Stream processing using Spark
- π Dual storage: S3 for raw and Postgres for processed data
- π Live system metrics using Prometheus and Grafana
- π οΈ Reproducible infra setup using AWS CDK
- Architecture finalized
- CDK project setup
- Kafka EC2 instance
- Spark EC2 instance
- Lambda sensor simulator
- PostgreSQL schema + RDS
- Prometheus + Grafana monitoring
- Final integration + demo
iot-streaming-pipeline/
β
βββ cdk/ # AWS CDK stacks
βββ lambda-simulator/ # Sensor simulation code
βββ kafka-setup/ # Kafka scripts and Dockerfiles
βββ spark-processor/ # PySpark processing logic
βββ monitoring/ # Prometheus & Grafana config
βββ infra-scripts/ # Setup scripts
βββ README.md
βββ .gitignore
Simulate temperature and humidity data from smart building sensors and process this data to:
- Detect abnormal readings
- Store trends in a database
- Trigger alerts via monitoring dashboard
Sai Kiran Anumalla
Varun Sai Danduri
MSCS @ Northeastern University
GitHub: @saikirananumalla
@VarunSai-DVS
MIT License β see LICENSE
file for details.