This repository is the official implementation of KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs.
We propose KG-Infused RAG, a framework that integrates KGs into RAG systems to implement 🧠spreading activation, a cognitive process that enables concept association and inference.
Below is an example illustrating the accumulated subgraph constructed through KG-guided spreading activation. Due to space limitations, only a portion of the subgraph is shown, and some activated entities are omitted.
Step 1: clone this repo
git clone git@github.com:thunlp/KG-Infused-RAG.git
cd KG-Infused-RAG
Step 2: Create environment and install dependencies
conda create -n kg-infused-rag python=3.10
conda activate kg-infused-rag
pip install -r requirements.txt
pip install -e .
All evaluation and training data can be downloaded here. Place the data under the /data/datasets
directory. The data is organized into two parts:
-
Evaluation Data: Includes the original test set and the corresponding initial retrieval results from both the corpus and the knowledge graph.
-
Training Data: Contains the original sampling outputs on the training set, as well as the constructed DPO training data derived from them.
Download the corpus, unzip the file and place the extracted data under the /data/corpus
directory:
wget https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
gunzip psgs_w100.tsv.gz
The knowledge graph used in our experiments are available via both 🤗 Hugging Face and ModelScope. Download the KG (Wikidata5M-KG), unzip the file and place the extracted data under the /data/KG
directory:
tar -xzvf wikidata5m_kg.tar.gz
Generate the embedding of the passages from corpus and entity descriptions from Wikidata5M-KG:
bash ./scripts/generate_embeddings_corpus.sh
bash ./scripts/generate_embeddings_kg.sh
- Retriever:
Contriever-MS MARCO
- Generator:
Qwen2.5-7B
andLLaMA3.1-8B
Before running the main pipeline, you need to perform an initial retrieval step to obtain the top-k passages and entities for each input question:
bash ./scripts/retrieval.sh
💡 Precomputed retrieval results are available here (see Evaluation Data in Datasets).
bash ./scripts/kg_aug_rag/kg_aug_rag.sh
We are currently organizing the code.
If you find our code, data, models, or the paper useful, please cite the paper:
@article{wu2025kg,
title={KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs},
author={Wu, Dingjun and Yan, Yukun and Liu, Zhenghao and Liu, Zhiyuan and Sun, Maosong},
journal={arXiv preprint arXiv:2506.09542},
year={2025}
}