This repository contains the official code for the ACL 2025 main conference paper: BookCoref: Coreference Resolution at Book Scale by Giuliano Martinelli, Tommaso Bonomo, Pere-lluìs Huguet Cabot and Roberto Navigli. We include the official outputs of the comparison systems outlined in the paper, which can be used to reproduce our results. Our silver training and gold evaluation data are available through this 🤗 Hugging Face dataset.
First of all, clone the repository:
git clone https://github.com/sapienzanlp/bookcoref.git
Then, create a Python virtual environment and install the requirements. We support Python 3.9 and above.
pip install -r requirements.txt
To download the BookCoref data for training and evaluation, run the download_data.py
script:
python download_data.py
options:
--format <"jsonl" or "conll">, default="jsonl" # Format of the dataset to download
--configuration <"default" or "split">, default="default" # Configuration of the huggingface dataset, either 'default' or 'split'
--output_dir <path>, default="data/" # If specified, the output directory for the dataset
This script will download data from 🤗 Hugging Face and save it in either JSONL or CoNLL format to the default directory data/
.
BookCoref is a collection of annotated books. Each item contains the annotations of one book following the structure of OntoNotes:
{
doc_id: "pride_and_prejudice_1342", # (str) i.e., ID of the document
gutenberg_key: "1342", # (str) i.e., key of the book in Project Gutenberg
sentences: [["CHAPTER", "I."], ["It", "is", "a", "truth", "universally", "acknowledged", ...], ...], # list[list[str]] i.e., list of word-tokenized sentences
clusters: [[[79,80], [81,82], ...], [[2727,2728]...], ...], # list[list[list[int]]] i.e., list of clusters' mention offsets
characters: [
{
name: "Mr Bennet",
cluster: [[79,80], ...],
},
{
name: "Mr. Darcy",
cluster: [[2727,2728], [2729,2730], ...],
}
] # list[character], list of characters objects consisting of name and mentions offsets, i,e., dict[name: str, cluster: list[list[int]]]
}
We also include informations on character names, which is not exploited in traditional coreference settings, but could be useful in future work.
To evaluate the outputs of a model on the BookCoref benchmark, run the evaluate.py
script:
python evaluate.py
options:
--predictions <path_to_predictions> # Path to the predictions file to evaluate.
--mode <"full", "split", "gold_window">, default="full" # Evaluation mode.
We provide three evaluation modes:
Mode | Description |
---|---|
full |
Evaluate model predictions on the full books of test.jsonl . Input: expects as input predictions on the full test set books. Output: scores on the full books of test.jsonl , referred to as BookCorefgold results in our paper. |
split |
Evaluate model predictions on test_split.jsonl . Input: expects as input predictions on the split version of our test set books. Output: scores on the split version ( test_split.jsonl ), referred to as Split-BookCorefgold results in our paper. |
gold_window |
Evaluate model predictions carried out on the full test.jsonl but evaluated on test_split.jsonl , by splitting clusters every 1500 tokens. Input: expects as input predictions on the full test set books. Output: scores on the split version ( test_split.jsonl ), referred to as BookCorefgold+window results in our paper. |
To replicate the results of our paper, run evaluate.py
specifying the path to the predictions of the model you are interested in.
Example:
$ python evaluate.py --predictions predictions/finetuned_bookcoref/maverick_xl.jsonl
Evaluation Results:
muc:
precision: 92.95
recall: 95.70
f1: 94.30
b_cubed:
precision: 43.08
recall: 77.19
f1: 55.30
ceafe:
precision: 37.10
recall: 30.46
f1: 33.45
conll2012:
precision: 57.71
recall: 67.78
f1: 61.02
This work has been published at ACL 2025 (main conference). If you use any artifact, please cite our paper as follows:
@inproceedings{martinelli-etal-2025-bookcoref,
title = "{BOOKCOREF}: Coreference Resolution at Book Scale",
author = "Martinelli, Giuliano and
Bonomo, Tommaso and
Huguet Cabot, Pere-Llu{\'i}s and
Navigli, Roberto",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.1197/",
pages = "24526--24544",
ISBN = "979-8-89176-251-0",
}
The data and software are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0.