This project successfully implements an automated ETL (Extract, Transform, Load) process that collects, transforms, and stores weather data for Casablanca, Morocco, focusing on evaluating the accuracy of daily temperature forecasts.
Task | Status |
---|---|
Initialize weather log file (rx_poc.log ) |
✅ Completed |
Write ETL Bash script to download, parse, and log weather data | ✅ Completed |
Automate script execution daily at noon via cron | ✅ Completed |
Write accuracy analysis script (fc_accuracy.sh ) |
✅ Completed |
Calculate daily forecast accuracy and label it | ✅ Completed |
Append daily accuracy report to historical_fc_accuracy.tsv |
✅ Completed |
Generalize script to compute historical accuracy for multiple days | ✅ Completed |
Create weekly stats script to report min/max absolute forecast error | ✅ Completed |
____________________________________________________________________________________________________________________________
Weather report: Casablanca
\ / Partly cloudy
_ /"".-. 21 °C
\_( ). ↘ 5 km/h
/(___(__) 10 km
0.0 mm
┌─────────────┐
┌──────────────────────────────┬───────────────────────┤ Sat 05 Jul ├───────────────────────┬──────────────────────────────┐
│ Morning │ Noon └──────┬──────┘ Evening │ Night │
├──────────────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ \ / Sunny │ \ / Sunny │ \ / Sunny │ \ / Clear │
│ .-. +23(25) °C │ .-. +25(26) °C │ .-. +23(25) °C │ .-. 21 °C │
│ ― ( ) ― ↓ 8-9 km/h │ ― ( ) ― ↘ 15-17 km/h │ ― ( ) ― ↘ 15-17 km/h │ ― ( ) ― ↓ 8-11 km/h │
│ `-’ 10 km │ `-’ 10 km │ `-’ 10 km │ `-’ 10 km │
│ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │
└──────────────────────────────┴──────────────────────────────┴──────────────────────────────┴──────────────────────────────┘
┌─────────────┐
┌──────────────────────────────┬───────────────────────┤ Sun 06 Jul ├───────────────────────┬──────────────────────────────┐
│ Morning │ Noon └──────┬──────┘ Evening │ Night │
├──────────────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ \ / Sunny │ \ / Sunny │ \ / Sunny │ \ / Clear │
│ .-. +23(25) °C │ .-. +25(26) °C │ .-. +24(26) °C │ .-. 22 °C │
│ ― ( ) ― ↘ 8-9 km/h │ ― ( ) ― ↘ 15-17 km/h │ ― ( ) ― ↘ 14-16 km/h │ ― ( ) ― ↘ 8-12 km/h │
│ `-’ 10 km │ `-’ 10 km │ `-’ 10 km │ `-’ 10 km │
│ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │
└──────────────────────────────┴──────────────────────────────┴──────────────────────────────┴──────────────────────────────┘
┌─────────────┐
┌──────────────────────────────┬───────────────────────┤ Mon 07 Jul ├───────────────────────┬──────────────────────────────┐
│ Morning │ Noon └──────┬──────┘ Evening │ Night │
├──────────────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ \ / Sunny │ \ / Sunny │ \ / Sunny │ _`/"".-. Patchy rain ne…│
│ .-. +24(25) °C │ .-. +26(27) °C │ .-. +24(26) °C │ ,\_( ). +22(25) °C │
│ ― ( ) ― → 8-10 km/h │ ― ( ) ― ↘ 14-16 km/h │ ― ( ) ― ↘ 15-17 km/h │ /(___(__) ↘ 5-7 km/h │
│ `-’ 10 km │ `-’ 10 km │ `-’ 10 km │ ‘ ‘ ‘ ‘ 10 km │
│ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ ‘ ‘ ‘ ‘ 0.0 mm | 72% │
└──────────────────────────────┴──────────────────────────────┴──────────────────────────────┴──────────────────────────────┘
Location: Casablanca ⵜⵉⴳⵎⵉ ⵜⵓⵎⵍⵉⵍⵜ الدار البيضاء, préfecture d'arrondissements de Casablanca-Anfa عمالة مقاطعات الدار البيضاء أنفا, Pachalik de Casablanca, Préfecture de Casablanca عمالة الدار البيضاء, Casablanca-Settat ⵜⵉⴳⵎⵉ ⵜⵓⵎⵍⵉⵍⵜ-ⵙⵟⵟⴰⵜ الدار البيضاء-سطات, ⵍⵎⵖⵔⵉⴱ المغرب [33.5949733,-7.6188008]
Follow @igor_chubin for wttr.in updates
_____________________________________________________________________________________________________________________________
year month day obs_temp fc_temp
2023 01 01 10 11
2023 01 02 11 12
2023 01 03 12 10
...
year month day obs_temp fc_temp accuracy accuracy_range
2023 01 02 11 12 -1 excellent
2023 01 03 12 10 2 good
2023 01 04 13 13 0 excellent
...
Minimum absolute forecast error (last 7 days): 0
Maximum absolute forecast error (last 7 days): 3
The ETL process (rx_poc.sh
) is executed daily at noon local time in Casablanca using a cron job configured as follows:
0 12 * * * /path/to/rx_poc.sh >> /path/to/rx_poc.log 2>&1
This ensures consistent, real-time collection of weather data for ongoing accuracy monitoring.
The system was built as a **proof-of-concept** to monitor and measure discrepancies between forecasted and observed temperatures, forming the foundational block for a broader analytics initiative.
## 📌 Objectives Achieved
* ✔️ Automated the extraction of **daily weather data** using `curl` from [wttr.in](https://wttr.in).
* ✔️ Parsed and transformed raw text output to isolate:
* Observed temperature at **noon (local time)**.
* Forecasted temperature for **noon the following day**.
* ✔️ Loaded clean, structured data into a **tab-separated log file**, forming a historical report.
* ✔️ Scheduled the process using a **cron job**, ensuring daily execution at the specified time.
* ✔️ Designed the output in a **tabular format**, ready for further analysis and modeling.
## 🧪 Sample Output
```plaintext
year month day obs_tmp fc_temp
2023 01 01 10 11
2023 01 02 11 12
2023 01 03 12 10
2023 01 04 13 13
2023 01 05 10 9
2023 01 06 11 10
- 🐧 Bash Shell Scripting
- ⌚ Cron Scheduler
- 🌐
curl
for HTTP-based data retrieval - 📦 wttr.in as the weather data provider
While this POC focused on a single city and data source, the architecture is scalable and ready to support:
- 🌍 Multiple locations
- 📡 Multiple forecast sources
- ⏱️ Configurable update frequencies
- 🌪️ Additional weather metrics (wind, visibility, precipitation...)
By completing this project, the following competencies were reinforced:
- ✅ Bash scripting for automation
- ✅ Web scraping with
curl
- ✅ Text parsing and data cleaning with Unix tools
- ✅ Cron job scheduling
- ✅ Forecast accuracy calculation and categorization
- ✅ Weekly statistical reporting with basic Bash logic