The Weather Stations Monitoring project aims to efficiently process and analyze high-frequency data streams from distributed weather stations within the Internet of Things (IoT) context. This project was developed as part of the course "Designing Data Intensive Applications."
The project addresses the challenge of efficiently processing and analyzing high-frequency data streams from distributed weather stations. It involves designing and implementing a robust system architecture comprising data acquisition, processing, archiving, and indexing stages. Additionally, the system integrates with the Open-Meteo API to enhance data sources. The primary goal is to develop a scalable and reliable weather monitoring system capable of handling diverse data types, ensuring data integrity, and enabling advanced analytics for weather forecasting and analysis.
- Integrated with the Open-Meteo API to simulate real-time weather data.
- The Weather Station mock fetches weather information such as temperature, humidity, and wind speed for specified locations.
- Data is periodically sent to the Central Station system for storage and retrieval, mimicking the behavior of actual weather stations.
- Used Bitnami images for Kafka and Zookeeper to create corresponding containers.
- Weather Stations use Kafka Producer API to send readings and station status on the "weather" topic.
- Central Station utilizes the Consumer API to consume messages stored in the corresponding topic.
- Used Kafka’s Processor API to process arriving records at the "weather" topic.
- Sent a special message to the "rain" topic indicating it’s raining if the humidity level is above 70%.
- The core component responsible for processing and archiving weather data received from multiple weather stations.
- Utilizes Bitcask for data storage, archiving data into Parquet files, and potentially indexing in Elasticsearch for further analysis.
- Manages the creation of Kafka topics for weather data and rain alerts.
- Manages data segment files, periodically compacts segments to save storage, and takes snapshots for quick recovery.
- Converts stored weather data into Parquet format for efficient storage and querying.
- Uses a watch script to monitor Parquet files and triggers a jar file to ingest data into Elasticsearch for real-time indexing and analysis.
- Created Docker images for weather stations, central station, and Elasticsearch uploader.
- Used Kubernetes for deploying components, including weather stations, Zookeeper, Kafka broker, Central Station, Elasticsearch uploader, Elasticsearch, and Kibana.
- Implemented shared storage for saving Parquet files and BitCask entries.
For a faster experience, the Central Station waits for only 100 records per station before archiving them into Parquet files.
- Queries used to get dropped messages percentage and low battery percentage:
1 - count() / max(s_no) count(kql='battery_status.keyword == 'low') / count()