Background
Data
Models
Timeline
Repo Structure
Logistics
Resources
Contact
The goal of this project is to develop an AI-powered question-answering system that automatically analyzes Climate Action Plans (CAPs) and other climate adaptation and mitigation documentation. The system will be capable of extracting key data about climate vulnerabilities, planned mitigation measures, and socio-economic and geographic context, providing well-sourced, accurate responses to user queries.
Climate change poses an urgent challenge for cities worldwide, prompting the creation of comprehensive Climate Action Plans (CAPs) to mitigate impacts and adapt to evolving conditions. These plans detail strategies for reducing emissions, addressing vulnerabilities, and protecting populations from climate risks, but their length and complexity make it difficult for city planners, researchers, and policymakers to efficiently extract and compare key information across regions.
This project addresses this by developing an AI-powered question-answering system that automates the extraction of critical information from CAPs. Using Natural Language Processing (NLP) and Machine Learning (ML) techniques, the system analyzes thousands of pages of climate documentation and provides accurate, well-sourced responses to climate-related inquiries, with LangChain facilitating the organization and structuring of extracted data for more efficient analysis.
Climate action Plans can be found under the CAPS folder. External data sources are housed on Box
Fall 2024 (September through December 2024 intially)
This repository contains code for three main components:
- Data ingestion and processing portal
- Climate Action Plan Tracker
The Climate Action Plan Tracker is hosted on Streamlit Cloud as well as HuggingFace Spaces. It may also be run locally. The data ingestion and processing portal is designed to be run locally only.
The repository also contains batch scripts. Run these scripts in the scenario where no data (Climate Action Plan Summaries, Vectorstores, and Dataset).
Users can run the tools using the following commands:
streamlit run data_ingestion_app.py
to run the data ingestion and processing portal
streamlit run app.py
to run the tool
/data_ingestion_helpers
contains the helper functions used in the data ingestion process. Each run of the data ingestion process will do the following:
- Save the new Climate Action Plan to the CAPS folder
- Collects the metatdata of the city (City, State, County, and City Center Coordinates) and updates the city_county_mapping.csv file
- Generates a summary of the Climate Action Plan and stores it in the CAPS_Summaries folder
- Creates the vector stores of the Climate Action Plan used in the QA tool (Individual, Summary and Combined Vector Stores)
- Queries an LLM to update the climate actions plans dataset in climate_actions_plans.csv
- Updates the CAPS plans list in caps_plans.csv
- Re-runs the maps_data.py script to update the data powering the maps part of the tool
batch_summary_generation.py
generates summaries for all CAPs in the CAPS folder and saves them in the CAPS_Summaries folder
caps_directory_reader.py
reads in the CAPS plans in the CAPS folder and saves the data to a csv file called caps_plans.csv
census_county_data.py
reads in the census data and saves the data to a csv file called us_counties.csv which is used by the data ingestion tool
create_vector_stores.py
creates the vector stores of the Climate Action Plan used in the QA tool (Individual, Summary and Combined Vector Stores)
dataset_generation.py
queries an LLM to create the climate actions plans dataset in climate_actions_plans.csv
In most cases, these batch process files will not need to be run.
/maps_helpers
contains the helper functions used in the maps tool and stores the data powering the maps tool
To run the tool, in a terminal run streamlit run app.py
. Please ensure that all necessary packages have been installed as per the requirements.txt
file. Necessary packages can be installed using pip: pip install -r requirements.txt
The Prompts
folder contains all the system prompt templates used in the tool. These can be modified to modify the behavior of the tools.
Sprint planning: Every Monday at 10-10:30am on Zoom.
Backlog Grooming: NA / as needed.
Sprint Restrospective: Every Friday 1:30-2pm on Zoom.
Demos: Every Friday at 3pm on Zoom as well as in person at the DSI.
Data location: Climate Policy Data
Slack channel: climate-policy on Data Science TIP slack organization. Please check your email for an invite.
Provide any useful resources to get readers up to speed with the project here.
- LangChain: Please see LangChain Tutorials
- Python usage: Whirlwind Tour of Python, Jake VanderPlas (Book, Notebooks)
- Data science packages in Python: Python Data Science Handbook, Jake VanderPlas
- HuggingFace: Website, Course/Training, Inference using pipelines, Fine tuning models
- fast.ai: Course, Quick start
- h2o: Resources, documentation, and API links
- nbdev: Overview, Tutorial
- Git tutorials: Simple Guide, Learn Git Branching
- ACCRE how-to guides: DSI How-tos
Project Lead: Umang Chaudhry, Senior Data Scientist, Vanderbilt Data Science Institute
PI: Dr. JB Ruhl, David Daniels Allen Distinguished Chair in Law, Vanderbilt University Law School
Project Manager: Isabella Urquia
Team Members: Ethan Thorpe, Mariah Caballero, Harmony Wang, Xuanxuan Chen, Aparna Lakshmi