Unanswerability Evaluation for Retrieval Augmented Generation

Existing evaluation frameworks for retrieval-augmented generation (RAG) systems focus on answerable queries, but they overlook the importance of appropriately rejecting unanswerable requests. In this project, we introduce UAEval4RAG, a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively. We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries for any given knowledge base with unanswered ratio and acceptable ratio metrics. We conduct experiments with various RAG components, including retrieval models, rewriting methods, rerankers, language models, and prompting strategies, and reveal hidden trade-offs in performance of RAG systems. Our findings highlight the critical role of component selection and prompt design in optimizing RAG systems to balance the accuracy of answerable queries with high rejection rates of unanswerable ones. UAEval4RAG provides valuable insights and tools for developing more robust and reliable RAG systems.

Setup

pyenv virtualenv 3.11 unans-rage
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pyenv activate unans-rage
pip install -r requirements.txt

Usage

Generate unanswerable queries

1. For the first five category

You need to have a OpenAI key export OPENAI_API_KEY='yourkey'
Check ./src/taxonomy/unanswerable_generation.py for example

# 1. Specify the contribution of different categories.
contribution = {
    "incomprehensible": 0.2,
    "false_presuppositions": 0.2,
    "underspecified": 0.2,
    "safety-concern": 0.2,
    "modality-limited": 0.2,
}

# 2. Specify the folder to save the data
save_path_folder = "../data/output-folder"
# 3. Specify the database folder
folder = "ur-path-to/database"
# 4. Specify the test size you want. (Total number of the dataset samples)
test_size = 300
await generate_unanswerable_batch(
    folder=folder,
    test_size=test_size,
    contribution=contribution,
    save_path_folder=save_path_folder,
    generator_llm="gpt-4o"
)

2. For the OOD category

You need to have a OpenAI key export OPENAI_API_KEY='yourkey'
Check example in ./src/ood/tenant_dataset_main.py

Evaluation

You need to have a OpenAI key export OPENAI_API_KEY='yourkey'
Check example in ./src/taxonomy/eval_unanswerable_harness.py

Citation

@article{peng2024unanswerability,
  title={Unanswerability Evaluation for Retreival Augmented Generation},
  author={Peng, Xiangyu and Choubey, Prafulla Kumar and Xiong, Caiming and Wu, Chien-Sheng},
  journal={arXiv preprint arXiv:2412.12300},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
AI_ETHICS.md		AI_ETHICS.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unanswerability Evaluation for Retrieval Augmented Generation

Setup

Usage

Generate unanswerable queries

1. For the first five category

2. For the OOD category

Evaluation

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

SalesforceAIResearch/Unanswerability_RAGE

Folders and files

Latest commit

History

Repository files navigation

Unanswerability Evaluation for Retrieval Augmented Generation

Setup

Usage

Generate unanswerable queries

1. For the first five category

2. For the OOD category

Evaluation

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages