From Symbols to Numbers: Measuring the Impact of Narrative Complexity on Embeddings

Code submitted with the paper titled "Measuring the Impact of Narrative Complexity on Knowledge Graph Embeddings" to ISWC 2025.

In this paper, we investigate how narrative semantic and syntactic levels impact embedding performance. This repo contains all the code to construct the KGs and to prepare the data for training with the following three methods: ULTRA, SimKGC, ILP.

The examples folder contains 3 examples of KG constructed on revolutions. Upon acceptance of the paper, we will release all constructed KGs, as well as the resulting datasets for the experiments on Zenodo.

Detailed description of the content:

Main

kg_construction folder: all related to KG construction part. All files should be run from within that folder.
- causation.py: extracting causation layer
- causation.sh: extract causation narrative layer (script)
- config_complex.json: config used for subevent extraction
- representations.py: class for KG conversion
- convert_kg.py: converting KG to different syntaxes (script)
- frames.py: extracting role layer
- gsf.py: extracting base, prop, subevent, text layers
- utils.py: utils (generic)
prep_data.py: preparing data (see file for arguments)
utils_inductive.py: utils (inductive splits)

Methods (ULTRA, SimKGC, ILP)

ultra folder: all related to ULTRA
- convert_data.py: convert data for ULTRA format
- convert_data.sh: convert data for ULTRA format (script)
- prep_data.sh: prep data for ULTRA
simkgc folder: all related to SimKGC (also reuses descriptions obtained for ILP, see below)
- get_ent_des_label.py: entity description
- get_relation_label.py: relation description
- prep_data.sh: prep data
ilp folder: all related to ILP
- data: cached values, including: frame and frame elements description from Framester, and DBpedia entities that have no label or description (in this case, we take the human-readable label from the IRI)
- extract_embeddings.py: using a transformer-based model to extract embeddings from a list of descriptions
- add_missing_des.sh: running extract_embeddings.py script for each file
- build_embeddings_ilp.py: building embeddings.pkl for the embeddings model necessary as inputs for ILP
- build_index_ilp.py: building index.txt for the embeddings model necessary as inputs for ILP
- get_bn_descriptions.py: descriptions from blank nodes
- get_fe_descriptions.py: descriptions for frame elements
- concat_embeddings.py: updating cached embeddings for more efficiency
- run.py: main, extracts all (data + descriptions)
- save_embeddings.sh: save embeddings for ILP

Other (analysis, tests, etc)

analysis folder: all related to the analysis of the experiments
- analysis.ipynb: statistical tests for syntax comparison
- analysis.py: correlation text/metrics
- helpers_paper.py: latex table formatting for the paper
- metrics_kg.py: extracting metrics from KGs
stats folder: all metrics from other repositories, to aggregate the results
tests folder: various tests. All files should be run from within that folder.
- kg_testfolder: KG used for testing certain functions
- tests.py: tests
revs_td.csv: revolutiond data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

From Symbols to Numbers: Measuring the Impact of Narrative Complexity on Embeddings

Main

Methods (ULTRA, SimKGC, ILP)

Other (analysis, tests, etc)

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
analysis		analysis
examples		examples
ilp		ilp
kg_construction		kg_construction
simkgc		simkgc
stats		stats
tests		tests
ultra		ultra
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clean.sh		clean.sh
prep_data.py		prep_data.py
revs_td.csv		revs_td.csv
utils_inductive.py		utils_inductive.py

License

SonyCSLParis/complex_kg_embeddings

Folders and files

Latest commit

History

Repository files navigation

From Symbols to Numbers: Measuring the Impact of Narrative Complexity on Embeddings

Main

Methods (ULTRA, SimKGC, ILP)

Other (analysis, tests, etc)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages