Passage Injection

Welcome to the Official Repository of Passage Injection!

This repository contains the code, datasets, and models used in our paper: Injecting External Knowledge into the Reasoning Process Enhances Retrieval-Augmented Generation.

Passage Injection is a simple yet effective method that explicitly incorporates retrieved passages into LLMs' reasoning process to enhance robustness against noisy information and improve RAG performance.

Reproduce Paper Results

Install Environment

conda create -n passage_injection python=3.11.2
conda activate passage_injection
pip install vllm==0.8.5
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.2.post1/flash_attn-2.7.2.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl

Prepare Data

You can directly use our processed data files in the datasets/ folder, which contain the top-10 retrieved passages for each question.

If you want to retrieve passages by yourself, please follow the steps below (adapted from PRAG).

Download Datasets

PopQA

Download the PopQA dataset from its repository https://github.com/AlexTMallen/adaptive-retrieval/blob/main/data/popQA.tsv, and put the file popQA.tsv into folder data/popqa.

ComplexWebQuestions

Download the ComplexWebQuestions dataset from its repository https://www.dropbox.com/scl/fo/nqujvpg2gc4y0ozkw3wgr/AOzjVEsdUhv2Fx2pamfJlSw?rlkey=746t7xehfqxf1zr867nxiq8aq&e=1, and put the file ComplexWebQuestions_dev.json into folder data/complexwebquestions.

2WikiMultihopQA:

Download the 2WikiMultihopQA dataset from its repository https://www.dropbox.com/s/ms2m13252h6xubs/data_ids_april7.zip?e=1. Unzip it and move the folder to data/2wikimultihopqa.

HotpotQA

Download the HotpotQA dataset with the following command:

mkdir -p data/hotpotqa
wget -P data/hotpotqa/ http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json

Retrieve Passages

Download the Wikipedia dump from the DPR repository using the following command:

mkdir -p data/dpr
wget -O data/dpr/psgs_w100.tsv.gz https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
pushd data/dpr
gzip -d psgs_w100.tsv.gz
popd

Use Elasticsearch to index the Wikipedia dump:

cd data
wget -O elasticsearch-8.15.0.tar.gz https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.15.0-linux-x86_64.tar.gz  # download Elasticsearch
tar zxvf elasticsearch-8.15.0.tar.gz
rm elasticsearch-8.15.0.tar.gz 
cd elasticsearch-8.15.0
nohup bin/elasticsearch &  # run Elasticsearch in background
cd ../..
python prep_elastic.py --data_path data/dpr/psgs_w100.tsv --index_name wiki  # build index

Run the following command to retrieve passages for each dataset:

python src/prepare.py --dataset popqa --topk 10
python src/prepare.py --dataset complexwebquestions --topk 10
python src/prepare.py --dataset 2wikimultihopqa --topk 10
python src/prepare.py --dataset hotpotqa --topk 10

Run Passage Injection

The following commands evaluate the performance of Passage Injection and other RAG baselines using top-k retrieved passages. Models should be placed in the models/ directory.

# generate predictions for multiple RAG methods
python src/inference.py --model_name Qwen3-32B --topk 5

# calculate metrics for the predictions
python src/evaluate.py --model_name Qwen3-32B --topk 5

Below are commands for additional experiments. The --further_type argument controls the type of injected passages:

random_noise: inject random irrelevant passages
cf_noise: inject counterfactual noisy passages
gold: inject gold (ground-truth) passages

# generate predictions with random noise
python src/infer_further.py --model_name Qwen3-32B --further_type random_noise

# calculate metrics for the predictions
python src/evaluate.py --model_name Qwen3-32B --further_type random_noise

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
datasets		datasets
src		src
README.md		README.md
overall.png		overall.png
prep_elastic.py		prep_elastic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Passage Injection

Reproduce Paper Results

Install Environment

Prepare Data

Download Datasets

Retrieve Passages

Run Passage Injection

About

Uh oh!

Releases

Packages

Languages

mh-tang/Passage-Injection

Folders and files

Latest commit

History

Repository files navigation

Passage Injection

Reproduce Paper Results

Install Environment

Prepare Data

Download Datasets

Retrieve Passages

Run Passage Injection

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages