This project builds a pipeline to generate multilingual emergency alerts using Federal Communications Commission's (FCC) Wireless Emergency Alert (WEA) Templates in 14 languages. It classifies the emergency type, extracts key information, and fills out multilingual templates in the Common Alerting Protocol (CAP) format. The goal is to improve alert accessibility for non-native speakers.
CAP is an XML-based data format for exchanging public warnings and emergencies between alerting technologies. CAP allows a warning message to be consistently disseminated simultaneously over many warning systems to many applications, such as Google Public Alerts and Cell Broadcast. CAP increases warning effectiveness and simplifies the task of activating a warning for responsible officials. In this project, we use a JSON-based version of CAP.
For more information on CAP, refer to Additional Resources.
The FCC WEA templates provide standardized multilingual templates for 18 different emergency types (including Test Alert) in 15 spoken languages (including American Sign Language (ASL)). Each template includes up to four placeholders: sending agency [SENDING_AGENCY], affected area [LOCATION], expiration time [TIME], and URL for additional information [URL]. The information for these placeholders is what we aim to extract. Additionally, we include an “Other” event type to handle emergency types not covered by the existing templates. We currently do not support the ASL template.
For more information on the FCC Wireless Emergency Alert Templates, refer to Additional Resources.
- Python 3.11 or higher
- Hugging Face account and authentication token for model access
- NVIDIA GPU (optional, for fine-tuning)
Clone the repository:
git clone https://github.com/yongsinp/EmergenCease.git
cd EmergenCease
Create a virtual environment and install the required dependencies:
# Create and activate virtual environment
conda create -n EmergenCease python=3.11 -y
conda activate EmergenCease
# Install dependencies
pip install -r requirements.txt
Add the following to .bashrc
or run the command to set up Hugging Face authentication token:
export HF_TOKEN="YOUR_HF_TOKEN" # Replace with your actual Hugging Face token
The project's main code is located in the src
directory. Use this directory as the working directory to run scripts.
export PYTHONPATH=/PATH_TO_PROJECT/EmergenCease:$PYTHONPATH
Run the following command with at least one of the --headline
, --description
, or --instruction
arguments to
generate a multilingual CAP alert.
You can also choose to run the command with the --cap
argument which expects a JSON string conforming to the Common
Alerting Protocol (CAP) format.
By default, the pipeline uses the Llama 3.2 1B Instruct model with a LoRA adapter trained on a small set of Integrated Public Alert and Warning System (IPAWS) Archived Alerts data to generate the alert.
# Running the command without any arguments will generate a sample Tornado Warning alert
python -m src.cap_translator.translate --headline ALERT_HEADLINE --description ALERT_DESCRIPTION --instruction ALERT_INSTRUCTION
Run python -m src.cap_translator.translate -h
for help.
[NOTE]
Therun_eval.sh
script in the project root runs the sample alert translation shown above and the evaluation pipeline described below. You must set theHF_TOKEN
environment variable in the script.
The Llama 3.2 1B Instruct model is downloaded automatically from Hugging Face to the models
directory.
Models may require authentication due to licensing restrictions.
To use these models, you need to have a Hugging Face account, set up your authentication token, and request access to
the models if necessary. Refer to Additional Resources for more information on Hugging Face
authentication.
You can pass the token with the model name, or set a HF_TOKEN
environment variable. To set the enviroment variable,
run or add the following to .bashrc
:
export HF_TOKEN="YOUR_HF_TOKEN" # Replace with your actual Hugging Face token
You can use the following command to manually download the model of your choice:
python -m src.utils.model # --model "meta-llama/Llama-3.2-1B-Instruct" --hf-token "YOUR_HF_TOKEN"
Run python -m src.utils.model -h
for help.
You can fine-tune Llama 3 (3.1-8B
, 3.2-1B
, and 3.2-3B
) models for better performance. If the model does not
already exist in the models
directory, it will be downloaded automatically. The trained LoRA adpaters will be saved in
the
models
directory.
python -m src.finetune.finetune # --model 3.2-1B --epochs 3 --batch-size 4 --log-level INFO
Run python -m src.finetune.finetune -h
for help.
To evaluate the performance of the extraction model, you can run the following command:
python -m src.eval.eval # --model meta-llama/Llama-3.2-1B-Instruct --adapter LoRA-Llama-3.2-1B-Instruct --test-data ./data/finetune/finetune_test.csv --runs 5
Run python -m src.eval.eval -h
for help.
A small set of tagged data is provided in the data/finetune
directory. The full IPAWS Archived Alerts dataset can be
downloaded by running the following command:
python -m src.data.download
For more information on the dataset, refer to Additional Resources.
The full IPAWS Archived Alerts dataset will be preprocessed, split, and saved in the data
directory. The dataset will
be automatically downloaded if it does not already exist. The sum of the train, validation, and test splits must equal
1.0.
python -m src.preprocess.preprocess # --train 0.8 --val 0.1 --test 0.1 --random-seed 575 --sample-per-class 2
Run python -m src.preprocess.preprocess -h
for help.
Converting the split dataset in CSV format to the JSON format can be done by running the following command:
python -m src.data.convert
The CAP templates are stored at src/cap_translator/cap_templtes.json
, in the following format:
{
"EVENT1": {
"LANGUAGE1": "EVENT1 template in LANGUAGE1",
"LANGUAGE2": "EVENT1 template in LANGUAGE2",
...
},
"EVENT2": {
"LANGUAGE1": "EVENT2 template in LANGUAGE1",
"LANGUAGE2": "EVENT2 template in LANGUAGE2",
...
}
}
The Common Alerting Protocol (CAP) JSON schema is stored at src.data.cap.SCHEMA
. This schema defines the structure and
constraints for CAP messages, ensuring that generated alerts conform to the expected format.
Possible values for status
, msgType
, scope
, urgency
, category
, severity
, certainty
, responseType
, and
others, including language and region codes, are defined in src/data/enums.py
The event types found in the IPAWS Archived Alerts dataset are mapped to the FCC Wireless Emergency Alert Template
events using the file at src/preprocess/event_map.yaml
, in the following format:
FCC_EVENT_TYPE_1:
- IPAWS_EVENT_TYPE_1
- IPAWS_EVENT_TYPE_2
FCC_EVENT_TYPE_2:
- IPAWS_EVENT_TYPE_3
- IPAWS_EVENT_TYPE_4
...
The nested field names found in the CAP templates are mapped to the internal field names using the file at
src/preprocess/ner_config.yaml
, in the following format:
nested.field.names1: internal_field_name1
nested.field.names2: internal_field_name2
- Common Alerting Protocol Version 1.2
- FCC Multilingual Wireless Emergency Alert Templates
- OpenFEMA Dataset: IPAWS Archived Alerts - v1
- Llama 3.2 1B Instruct
- Hugging Face User Access Tokens
- Manage Your Hugging Face Access Tokens
Parts of this work were done on the University of Washington’s high-performance computing cluster, Hyak, which is funded by the Student Technology Fee.