Skip to content

vanderbilt-data-science/climate-policy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Driven Assessment of Trends in Climate Policy

Quick navigation

Background
Data
Models
Timeline
Repo Structure
Logistics
Resources
Contact

Goal

The goal of this project is to develop an AI-powered question-answering system that automatically analyzes Climate Action Plans (CAPs) and other climate adaptation and mitigation documentation. The system will be capable of extracting key data about climate vulnerabilities, planned mitigation measures, and socio-economic and geographic context, providing well-sourced, accurate responses to user queries.

Background

Climate change poses an urgent challenge for cities worldwide, prompting the creation of comprehensive Climate Action Plans (CAPs) to mitigate impacts and adapt to evolving conditions. These plans detail strategies for reducing emissions, addressing vulnerabilities, and protecting populations from climate risks, but their length and complexity make it difficult for city planners, researchers, and policymakers to efficiently extract and compare key information across regions.

This project addresses this by developing an AI-powered question-answering system that automates the extraction of critical information from CAPs. Using Natural Language Processing (NLP) and Machine Learning (ML) techniques, the system analyzes thousands of pages of climate documentation and provides accurate, well-sourced responses to climate-related inquiries, with LangChain facilitating the organization and structuring of extracted data for more efficient analysis.

Data

Climate action Plans can be found under the CAPS folder. External data sources are housed on Box

Timeline

Fall 2024 (September through December 2024 intially)

Repo Structure

This repository contains code for three main components:

  1. Data ingestion and processing portal
  2. Climate Action Plan Tracker

The Climate Action Plan Tracker is hosted on Streamlit Cloud as well as HuggingFace Spaces. It may also be run locally. The data ingestion and processing portal is designed to be run locally only.

The repository also contains batch scripts. Run these scripts in the scenario where no data (Climate Action Plan Summaries, Vectorstores, and Dataset).

Users can run the tools using the following commands:

streamlit run data_ingestion_app.py to run the data ingestion and processing portal

streamlit run app.py to run the tool

/data contains all the externald data sources used in the maps tool

/data_ingestion_helpers contains the helper functions used in the data ingestion process. Each run of the data ingestion process will do the following:

  1. Save the new Climate Action Plan to the CAPS folder
  2. Collects the metatdata of the city (City, State, County, and City Center Coordinates) and updates the city_county_mapping.csv file
  3. Generates a summary of the Climate Action Plan and stores it in the CAPS_Summaries folder
  4. Creates the vector stores of the Climate Action Plan used in the QA tool (Individual, Summary and Combined Vector Stores)
  5. Queries an LLM to update the climate actions plans dataset in climate_actions_plans.csv
  6. Updates the CAPS plans list in caps_plans.csv
  7. Re-runs the maps_data.py script to update the data powering the maps part of the tool

/batch_scripts contains scripts that can be run to batch process CAPs.

batch_summary_generation.py generates summaries for all CAPs in the CAPS folder and saves them in the CAPS_Summaries folder

caps_directory_reader.py reads in the CAPS plans in the CAPS folder and saves the data to a csv file called caps_plans.csv

census_county_data.py reads in the census data and saves the data to a csv file called us_counties.csv which is used by the data ingestion tool

create_vector_stores.py creates the vector stores of the Climate Action Plan used in the QA tool (Individual, Summary and Combined Vector Stores)

dataset_generation.py queries an LLM to create the climate actions plans dataset in climate_actions_plans.csv

In most cases, these batch process files will not need to be run.

/maps_helpers contains the helper functions used in the maps tool and stores the data powering the maps tool

To run the tool, in a terminal run streamlit run app.py. Please ensure that all necessary packages have been installed as per the requirements.txt file. Necessary packages can be installed using pip: pip install -r requirements.txt

The Prompts folder contains all the system prompt templates used in the tool. These can be modified to modify the behavior of the tools.

Project logistics

Sprint planning: Every Monday at 10-10:30am on Zoom.

Backlog Grooming: NA / as needed.

Sprint Restrospective: Every Friday 1:30-2pm on Zoom.

Demos: Every Friday at 3pm on Zoom as well as in person at the DSI.

Data location: Climate Policy Data

Slack channel: climate-policy on Data Science TIP slack organization. Please check your email for an invite.

Resources

Provide any useful resources to get readers up to speed with the project here.

Contact Info

Project Lead: Umang Chaudhry, Senior Data Scientist, Vanderbilt Data Science Institute
PI: Dr. JB Ruhl, David Daniels Allen Distinguished Chair in Law, Vanderbilt University Law School
Project Manager: Isabella Urquia
Team Members: Ethan Thorpe, Mariah Caballero, Harmony Wang, Xuanxuan Chen, Aparna Lakshmi

Contributors 3

  •  
  •  
  •  

Languages