Code submitted with the paper titled "Measuring the Impact of Narrative Complexity on Knowledge Graph Embeddings" to ISWC 2025.
In this paper, we investigate how narrative semantic and syntactic levels impact embedding performance. This repo contains all the code to construct the KGs and to prepare the data for training with the following three methods: ULTRA, SimKGC, ILP.
The examples
folder contains 3 examples of KG constructed on revolutions. Upon acceptance of the paper, we will release all constructed KGs, as well as the resulting datasets for the experiments on Zenodo.
Detailed description of the content:
-
kg_construction
folder: all related to KG construction part. All files should be run from within that folder.causation.py
: extracting causation layercausation.sh
: extract causation narrative layer (script)config_complex.json
: config used for subevent extractionrepresentations.py
: class for KG conversionconvert_kg.py
: converting KG to different syntaxes (script)frames.py
: extracting role layergsf.py
: extracting base, prop, subevent, text layersutils.py
: utils (generic)
-
prep_data.py
: preparing data (see file for arguments) -
utils_inductive.py
: utils (inductive splits)
-
ultra
folder: all related to ULTRAconvert_data.py
: convert data for ULTRA formatconvert_data.sh
: convert data for ULTRA format (script)prep_data.sh
: prep data for ULTRA
-
simkgc
folder: all related to SimKGC (also reuses descriptions obtained for ILP, see below)get_ent_des_label.py
: entity descriptionget_relation_label.py
: relation descriptionprep_data.sh
: prep data
-
ilp
folder: all related to ILPdata
: cached values, including: frame and frame elements description from Framester, and DBpedia entities that have no label or description (in this case, we take the human-readable label from the IRI)extract_embeddings.py
: using a transformer-based model to extract embeddings from a list of descriptionsadd_missing_des.sh
: runningextract_embeddings.py
script for each filebuild_embeddings_ilp.py
: buildingembeddings.pkl
for the embeddings model necessary as inputs for ILPbuild_index_ilp.py
: buildingindex.txt
for the embeddings model necessary as inputs for ILPget_bn_descriptions.py
: descriptions from blank nodesget_fe_descriptions.py
: descriptions for frame elementsconcat_embeddings.py
: updating cached embeddings for more efficiencyrun.py
: main, extracts all (data + descriptions)save_embeddings.sh
: save embeddings for ILP
-
analysis
folder: all related to the analysis of the experimentsanalysis.ipynb
: statistical tests for syntax comparisonanalysis.py
: correlation text/metricshelpers_paper.py
: latex table formatting for the papermetrics_kg.py
: extracting metrics from KGs
-
stats
folder: all metrics from other repositories, to aggregate the results -
tests
folder: various tests. All files should be run from within that folder.kg_test
folder: KG used for testing certain functionstests.py
: tests
-
revs_td.csv
: revolutiond data