Skip to content

yongsinp/EmergenCease

Repository files navigation

EmergenCease

Overview

This project builds a pipeline to generate multilingual emergency alerts using Federal Communications Commission's (FCC) Wireless Emergency Alert (WEA) Templates in 14 languages. It classifies the emergency type, extracts key information, and fills out multilingual templates in the Common Alerting Protocol (CAP) format. The goal is to improve alert accessibility for non-native speakers.

Common Alerting Protocol (CAP)

CAP is an XML-based data format for exchanging public warnings and emergencies between alerting technologies. CAP allows a warning message to be consistently disseminated simultaneously over many warning systems to many applications, such as Google Public Alerts and Cell Broadcast. CAP increases warning effectiveness and simplifies the task of activating a warning for responsible officials. In this project, we use a JSON-based version of CAP.

For more information on CAP, refer to Additional Resources.

Federal Communications Commission (FCC) Wireless Emergency Alert (WEA) Templates

The FCC WEA templates provide standardized multilingual templates for 18 different emergency types (including Test Alert) in 15 spoken languages (including American Sign Language (ASL)). Each template includes up to four placeholders: sending agency [SENDING_AGENCY], affected area [LOCATION], expiration time [TIME], and URL for additional information [URL]. The information for these placeholders is what we aim to extract. Additionally, we include an “Other” event type to handle emergency types not covered by the existing templates. We currently do not support the ASL template.

For more information on the FCC Wireless Emergency Alert Templates, refer to Additional Resources.

Getting Started

Prerequisites

  • Python 3.11 or higher
  • Hugging Face account and authentication token for model access
  • NVIDIA GPU (optional, for fine-tuning)

Installation

Clone the repository:

git clone https://github.com/yongsinp/EmergenCease.git
cd EmergenCease

Create a virtual environment and install the required dependencies:

# Create and activate virtual environment
conda create -n EmergenCease python=3.11 -y
conda activate EmergenCease

# Install dependencies
pip install -r requirements.txt

Add the following to .bashrc or run the command to set up Hugging Face authentication token:

export HF_TOKEN="YOUR_HF_TOKEN"  # Replace with your actual Hugging Face token

Usage

The project's main code is located in the src directory. Use this directory as the working directory to run scripts.

export PYTHONPATH=/PATH_TO_PROJECT/EmergenCease:$PYTHONPATH

Running the Pipeline

Run the following command with at least one of the --headline, --description, or --instruction arguments to generate a multilingual CAP alert. You can also choose to run the command with the --cap argument which expects a JSON string conforming to the Common Alerting Protocol (CAP) format.

By default, the pipeline uses the Llama 3.2 1B Instruct model with a LoRA adapter trained on a small set of Integrated Public Alert and Warning System (IPAWS) Archived Alerts data to generate the alert.

# Running the command without any arguments will generate a sample Tornado Warning alert
python -m src.cap_translator.translate --headline ALERT_HEADLINE --description ALERT_DESCRIPTION --instruction ALERT_INSTRUCTION

Run python -m src.cap_translator.translate -h for help.

[NOTE]
The run_eval.sh script in the project root runs the sample alert translation shown above and the evaluation pipeline described below. You must set the HF_TOKEN environment variable in the script.

Downloading Models

The Llama 3.2 1B Instruct model is downloaded automatically from Hugging Face to the models directory. Models may require authentication due to licensing restrictions. To use these models, you need to have a Hugging Face account, set up your authentication token, and request access to the models if necessary. Refer to Additional Resources for more information on Hugging Face authentication.

You can pass the token with the model name, or set a HF_TOKEN environment variable. To set the enviroment variable, run or add the following to .bashrc:

export HF_TOKEN="YOUR_HF_TOKEN"  # Replace with your actual Hugging Face token

You can use the following command to manually download the model of your choice:

python -m src.utils.model  # --model "meta-llama/Llama-3.2-1B-Instruct" --hf-token "YOUR_HF_TOKEN" 

Run python -m src.utils.model -h for help.

Fine-tuning

You can fine-tune Llama 3 (3.1-8B, 3.2-1B, and 3.2-3B) models for better performance. If the model does not already exist in the models directory, it will be downloaded automatically. The trained LoRA adpaters will be saved in the models directory.

python -m src.finetune.finetune  # --model 3.2-1B --epochs 3 --batch-size 4 --log-level INFO

Run python -m src.finetune.finetune -h for help.

Evaluation

To evaluate the performance of the extraction model, you can run the following command:

python -m src.eval.eval  # --model meta-llama/Llama-3.2-1B-Instruct --adapter LoRA-Llama-3.2-1B-Instruct --test-data ./data/finetune/finetune_test.csv --runs 5

Run python -m src.eval.eval -h for help.

(Optional) Downloading the Dataset

A small set of tagged data is provided in the data/finetune directory. The full IPAWS Archived Alerts dataset can be downloaded by running the following command:

python -m src.data.download

For more information on the dataset, refer to Additional Resources.

(Optional) Preprocessing

The full IPAWS Archived Alerts dataset will be preprocessed, split, and saved in the data directory. The dataset will be automatically downloaded if it does not already exist. The sum of the train, validation, and test splits must equal 1.0.

python -m src.preprocess.preprocess  # --train 0.8 --val 0.1 --test 0.1 --random-seed 575 --sample-per-class 2

Run python -m src.preprocess.preprocess -h for help.

Converting the split dataset in CSV format to the JSON format can be done by running the following command:

python -m src.data.convert

Miscellaneous

Common Alerting Protocol (CAP) Templates

The CAP templates are stored at src/cap_translator/cap_templtes.json, in the following format:

{
  "EVENT1": {
    "LANGUAGE1": "EVENT1 template in LANGUAGE1",
    "LANGUAGE2": "EVENT1 template in LANGUAGE2",
    ...
  },
  "EVENT2": {
    "LANGUAGE1": "EVENT2 template in LANGUAGE1",
    "LANGUAGE2": "EVENT2 template in LANGUAGE2",
    ...
  }
}

Common Alerting Protocol (CAP) JSON Schema

The Common Alerting Protocol (CAP) JSON schema is stored at src.data.cap.SCHEMA. This schema defines the structure and constraints for CAP messages, ensuring that generated alerts conform to the expected format.

Possible values for status, msgType, scope, urgency, category, severity, certainty, responseType, and others, including language and region codes, are defined in src/data/enums.py

Emergency (Event) and Nested Field Mapping

The event types found in the IPAWS Archived Alerts dataset are mapped to the FCC Wireless Emergency Alert Template events using the file at src/preprocess/event_map.yaml, in the following format:

FCC_EVENT_TYPE_1:
  - IPAWS_EVENT_TYPE_1
  - IPAWS_EVENT_TYPE_2
FCC_EVENT_TYPE_2:
  - IPAWS_EVENT_TYPE_3
  - IPAWS_EVENT_TYPE_4
...

The nested field names found in the CAP templates are mapped to the internal field names using the file at src/preprocess/ner_config.yaml, in the following format:

nested.field.names1: internal_field_name1
nested.field.names2: internal_field_name2

Additional Resources

Acknowledgments

Parts of this work were done on the University of Washington’s high-performance computing cluster, Hyak, which is funded by the Student Technology Fee.

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •