This project aimed to extract data from NOAA Historical Storm Events Dataset into a CSV file for Harris County, Texas. It can be easily modified to extract historical data for any given county in the US.
- python 3.7.3
- VS Code
- csv
- glob
- gzip
- numpy 1.19.1
- pandas 1.0.5
- sys
- os
1. Install virtualenv globally using this tutorial.
In the desired folder, create and activate your virtual environment.
# Virtualenv modules installation (Windows based systems)
$ virtualenv --no-site-packages env
# Activate Virtualenv
$ .\env\Scripts\activate
# Virtualenv modules installation (Unix based systems - Linux/Mac)
$ virtualenv --no-site-packages env
### OR
$ virtualenv --python=/usr/bin/python3 --no-site-packages env
# Activate Virtualenv
$ source env/bin/activate
# Install modules
$ pip3 install -r requirements.txt
-
basefunctions.py
= Script contains all core functions for this task. -
extract-gz-to-csv.py
= Extract all csv files from .gzip into the related category folder. -
summarize-data-for-harris-county.py
= Filter Dataset for Harris County, Texas. Summarize it into a csv for each category. -
table-join.py
= Using Pandas Library to join the fatalaties, locations, and details tables into a final csv file (EVENT_ID = Primary database key field).
Run scripts in this order:
1.extract-gz-to-csv.py
>
2.summarize-data-for-harris-county.py
>
3.table-join.py
You will need to pass the source directory, which is where your scripts and dataset are located. Then, the following structure will be created.
- Source Dir
- Scripts
- .gzip files
- StormEvents CSV (new)
- N00_Original (new)
- details (new)
- locations (new)
- fatalities (new)
- N01_InProcess (new)
- N02_Final (new)
- N00_Original (new)
Made with ❤️ by Wilson Franca 👋