Skip to content

ltgoslo/ucdp-aec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UCDP-AEC (Abstractive Event analysis Corpus)

This repository contains the software and data associated with the paper:
Abstractive Event Analysis of Armed Conflicts: Introducing the UCDP-AEC Dataset

Dataset Preparation

The data directory contains the dataset splits in two formats: huggingface datasets (e.g. datasets.load_from_disk("train")) and jsonl. In both cases, the source_article field contains HPLT document IDs. The easiest way to work with this dataset is to first replace those IDs by the actual HPLT documents. To that end the following script is provided:

python aec/ids_to_documents.py data/UCDP-AEC data/UCDP-AEC-ids

If you don't want to install huggingface datasets library, you can convert only the jsonl files by adding the -J argument to that command:

python aec/ids_to_documents.py -J data/UCDP-AEC data/UCDP-AEC-ids

Evaluation Script

For model evaluation, generate a jsonl files with one prediction per line such as:

{"id": 442069, "side_a_name": "Government of Myanmar (Burma)", "side_b_name": "ULA", "start_date": "2022-05-26", "end_date": "2022-05-26", "location_root_name": "Myanmar (Burma)", "location_adm1_name": "Chin state", "location_adm2_name": "Mindat district", "location_where_name": "Paletwa town", "deaths_side_a": 2, "deaths_side_b": 0, "deaths_civilian": 0, "deaths_unknown": 0, "deaths_low": 2, "deaths_high": 3}

Note that id and deaths fields are typed as integers, everything else is typed as strings.

It's a good practice to drop those fields (except id) from the test set after loading it to make sure you're using generate and not teacher forcing. Then use aec/evaluate.py to evaluate the model.

Other Code Released

The hplt_align directory contains code used for HPLT document matching.

The analysis directory contains scripts we used to generate the statistics given in the paper.

The baselines directory contains model code used in the experiments, some subdirectories are modified version of existing code: Text2Event and DEGREE. See the dedicated README for details on how to run the models.

Citation

The proceedings are not published yet, but a preprint can be found here.

@inproceedings{simon-etal-2025-abstractive,
    title     = {Abstractive Event Analysis of Armed Conflicts: Introducing the {UCDP-AEC} Dataset},
    author    = {Simon, \'{E}tienne and Olsen, Helene B\o{}sei and Carre\~{n}o, Ram\'{o}n and Mishra, Rahul and Arefyev, Nikolay and Yilmaz, Mert Can and \O{}vrelid, Lilja and Velldal, Erik},
    year      = {2025},
    month     = sep,
    booktitle = {Proceedings of the 5th Workshop on Computational Linguistics for the Political and Social Sciences},
    publisher = {Association for Computational Linguistics},
    address   = {Hildesheim, Germany},
}

Licence

We release all the code in this repository under GNU AGPL licence with the exception of the content of the baselines/Text2Event and baselines/DEGREE directories which keep the licences of their original authors. We release our modifications to these directories under the same licence as the original code, that is MIT for Text2Event and Apache 2.0 for DEGREE.

About

UCDP Abstractive Event analysis Corpus

Resources

License

Stars

Watchers

Forks