This project is an end-to-end data engineering pipeline that extracts weather forecast data from BMKG (Indonesian non-departmental government agency for meteorology, climatology, and geophysics), stores it in a local database, and feeds the data into a dashboard. The project is built entirely in Python.
To extract the data, the project uses BMKG's API, but it faces some challenges as the API is blocked and the website prevents scraping. The project solves this problem by using "IMPORTHTML" and "IMPORTXML" formulas from Google Sheets, which can bypass the website's restrictions. This is the link to copy the Google Sheets file that was used to extract the data.
After extracting the data, the project stores the raw data in a data warehouse that runs locally. The project does not use S3 storage, instead using local storage as a data warehouse. Then, the project processes the raw data into clean data, using pandas module extensively, and stores it in a database that runs on a local PostgreSQL server. The project uses psycopg2 module to connect PostgreSQL and Python.
The project also creates a scheduled workflow of extract-load-transform data daily using Apache Airflow, which runs inside a Docker container. Finally, the project uses Streamlit, an app framework in Python, to display a dashboard that shows the weather forecast and the temperature of Kab. Kupang, NTT. For display purposes, the data source comes from this project directory.
Trivia:
- This is just a sample project, so the dataset is small. The project deliberately chooses a location and omits other metrics, such as humidity, wind direction, etc.
- The project chooses Kab. Kupang, NTT because that is a city where Indonesian National Observatory (at Mount Timau) is located, so this dashboard could be used to track the weather for astronomy observation purpose.
- Weather Data - Data Prakiraan Cuaca Terbuka BMKG
- Project Structure Inspiration - Surfline Dashboard
- Docker Compose For Airflow - Running Airflow in Docker
- Airflow Docker Tutorial - How to Install and Run Apache Airflow Using Docker in Windows 11 | Airflow Docker #airflow
- Deploy Airflow - Deploy Apache Airflow in Multiple Docker Containers
- Streamlit Tutorial - Python Streamlit Full Course
- Streamlit Tutorial (2) - Building a Dashboard web app in Python - Full Streamlit Tutorial