Code for Eagle Rock Analytics' cloud-based, historical weather observations data platform
The Historical Observations Data Platform is a cloud-based, historical weather observations data platform to enable California's energy sector access to high-quality, open climate and weather data. This work is supported by California Energy Commission grant PIR-19-006. This repository contains the code (via Python scripts and Jupyter Notebooks) associated with the full processing pipeline for data ingestion into the Historical Data Platform.
The Platform responds to community partner needs in understanding weather and climate information including the severity, duration, frequency, and rate of change over time of extreme weather events, as well as supporting projections downscaling efforts. We implement stringent Quality Assurance/Quality Control (QA/QC) procedures in line with international protocols and with customized modifications relevant to energy sector (such as temperature and precipitation extremes, winds, and solar radiation).
Warning
This project is still is under active development.
The Platform has sourced station data from from 27 publicly available historical data observation networks across the Western Electricity Coordinating Council (WECC) domain from 1980-2022 (time period varies between networks and stations). 14,927 stations total have completed the full quality control and standardization pipelines and are publically available as cloud-optimized zarrs in Amazon s3 storage.
The following figure shows the locations of all the stations (by network) that have completed our quality control and standardization process:
And here you can see the number of observations throughout the project's time period:
historical-obs-platform/
├── data/ # Miscellaneous supporting data
├── data-access/ # Code examples for accessing our data
├── environment/ # Files for building the computational environment, including a README with further instructions
├── figures/ # Visualizations
├── notebooks/ # Jupyter notebooks for data visualization and analysis
├── scripts/ # Data processing code for all steps of the QAQC process
│ ├── 1_pull_data/ # Scripts for retrieving/scrape network station data from their respective sources
│ ├── 2_clean_data/ # Scripts for cleaning individual networks to a consistent standard
│ ├── 3_qaqc_data/ # Scripts to QA/QC stations
│ ├── 4_merge_data/ # Scripts to close out processing, and standardize to hourly timesteps. Data at conclusion have been fully processed.
│ ├── misc/ # Scripts that don't fit into any other categories
│ ├── pcluster/ # Code and shell scripts for running QAQC and merge scripts in an AWS pcluster environment
│ └── tests/ # Scripts for testing finalized data products
└──
See the environment folder for instructions and files for building the computational environment for this project.
This project is licensed under the GNU GPLv3 - see the LICENSE file for details.
- 📧 Email: info@eaglerockanalytics.com
- 🐛 Issues: GitHub Issues