This repository contains the software and data associated with the paper:
Abstractive Event Analysis of Armed Conflicts: Introducing the UCDP-AEC Dataset
The data directory contains the dataset splits in two formats: huggingface datasets (e.g. datasets.load_from_disk("train")
) and jsonl
.
In both cases, the source_article
field contains HPLT document IDs.
The easiest way to work with this dataset is to first replace those IDs by the actual HPLT documents.
To that end the following script is provided:
python aec/ids_to_documents.py data/UCDP-AEC data/UCDP-AEC-ids
If you don't want to install huggingface datasets library, you can convert only the jsonl
files by adding the -J
argument to that command:
python aec/ids_to_documents.py -J data/UCDP-AEC data/UCDP-AEC-ids
For model evaluation, generate a jsonl files with one prediction per line such as:
{"id": 442069, "side_a_name": "Government of Myanmar (Burma)", "side_b_name": "ULA", "start_date": "2022-05-26", "end_date": "2022-05-26", "location_root_name": "Myanmar (Burma)", "location_adm1_name": "Chin state", "location_adm2_name": "Mindat district", "location_where_name": "Paletwa town", "deaths_side_a": 2, "deaths_side_b": 0, "deaths_civilian": 0, "deaths_unknown": 0, "deaths_low": 2, "deaths_high": 3}
Note that id and deaths fields are typed as integers, everything else is typed as strings.
It's a good practice to drop those fields (except id
) from the test set after loading it to make sure you're using generate and not teacher forcing. Then use aec/evaluate.py to evaluate the model.
The hplt_align
directory contains code used for HPLT document matching.
The analysis
directory contains scripts we used to generate the statistics given in the paper.
The baselines
directory contains model code used in the experiments, some subdirectories are modified version of existing code: Text2Event and DEGREE. See the dedicated README for details on how to run the models.
The proceedings are not published yet, but a preprint can be found here.
@inproceedings{simon-etal-2025-abstractive,
title = {Abstractive Event Analysis of Armed Conflicts: Introducing the {UCDP-AEC} Dataset},
author = {Simon, \'{E}tienne and Olsen, Helene B\o{}sei and Carre\~{n}o, Ram\'{o}n and Mishra, Rahul and Arefyev, Nikolay and Yilmaz, Mert Can and \O{}vrelid, Lilja and Velldal, Erik},
year = {2025},
month = sep,
booktitle = {Proceedings of the 5th Workshop on Computational Linguistics for the Political and Social Sciences},
publisher = {Association for Computational Linguistics},
address = {Hildesheim, Germany},
}
We release all the code in this repository under GNU AGPL licence with the exception of the content of the baselines/Text2Event
and baselines/DEGREE
directories which keep the licences of their original authors. We release our modifications to these directories under the same licence as the original code, that is MIT for Text2Event and Apache 2.0 for DEGREE.