This is an example project to illustrate some organization and reproducability concepts for the Policy & Data Studio course in Summer 2020.
You will first need to download and install Python3.
Then clone this project repository, and open a command prompt/terminal window and naviate to the project directory.
Now you can create a virtual environment for python. On Mac OS you can use the following steps:
pip3 install virtualenv # if you don't already have this installed
virtualenv env
source env/bin/activate
which python # confirm you are using the virtual env
pip3 install -r requirements.txt
python -m ipykernel install --user --name=env
jupyter notebook
To recreate all the results you will need to run all of the numbered scripts and notebooks in /code
:
Alternatively, you can run each step individually following the numbered ordering of files:
01_download-tracts.py
- This downloads a shapefile for all NYC census tracts (2010) from NYC's Open Data portal, and saves the files in
/data/raw
- This downloads a shapefile for all NYC census tracts (2010) from NYC's Open Data portal, and saves the files in
02_download-acs.py
- This downloads ACS summary file data for NYC tracts from a separate project, where it was originally downloaded with R using the
tidycensus
package to access the Census API.
- This downloads ACS summary file data for NYC tracts from a separate project, where it was originally downloaded with R using the
03_clean-join-tract-data.ipynb
- This reads in the two raw data files created above, and claculates some new ACS variables and joins the ACS data with the tract geometries from the shapefile. The final clean tract-level dataset with geometries is saved to
/data/clean
- This reads in the two raw data files created above, and claculates some new ACS variables and joins the ACS data with the tract geometries from the shapefile. The final clean tract-level dataset with geometries is saved to
99-1_tract-maps.ipynb