KnowOrNot

KnowOrNot is an open-source framework that enables users to develop their own customized evaluation data and pipelines for evaluating out-of-knowledge base robustness, i.e. whether large language models (LLMs) can properly recognize the boundaries of their knowledge and abstain from answering when they don't know the answer.

Highlights

Unified, high-level API that streamlines the process of setting up and running robustness benchmark (only a source document is required to get the pipeline running)
Modular architecture emphasises extensibility and flexibility, allowing users to easily integrate their own LLM clients and RAG settings
Rigorous data modeling design ensures experiment reproducibility, reliability and traceability
Comprehensive suite of tools for users to customize their pipelines

Installation

Create and activate a virtual environment. Install uv.

python3 -m venv knowornot
source knowornot/bin/activate
pip install uv

Download the source code and enter the created source directory.

git clone git@github.com:govtech-responsibleai/KnowOrNot.git
cd knowornot

Install the library

uv pip install .

Set up environment variables in a .env file, depending on the LLM provider of choice. The sample script example/quickstart_pipeline.py depends on OpenAI and would require OpenAI environment variables. Refer to env.example for an example.
Run a sample evaluation pipeline.

uv run python example/quickstart_pipeline.py

Quick Start

Refer to quickstart.md for more information and quickstart_pipeline.py for an end-to-end example flow.

Key Features

LLM Provider

OpenAI: use add_openai() method
Gemini API: use add_gemini() method
Azure: use add_azure() method
OpenRouter: use add_openrouter() method

Processing of LLM responses

Asynchronous: use run_experiment_async, evaluate_experiment_async method
Synchronous: use run_experiment, evaluate_experiment method

Experiment Types

Leave-one-out: Remove a fact from the model's context and test if it can still answer correctly (measures memorization)
Random (Synthetic): Create questions the model shouldn't know based on provided context (tests abstention capability)

Retrieval Strategies

DIRECT: No context provided - tests raw model knowledge
BASIC_RAG: Provides semantically relevant context using embedding similarity
LONG_IN_CONTEXT: Provides all available context
HYDE_RAG: Uses hypothetical document embeddings for more effective retrieval

Evaluation Framework

Create custom evaluation metrics
Measure abstention rates, hallucination tendencies, and answer accuracy
Compare performance across different retrieval methods and models

Architecture

KnowOrNot consists of several integrated components that are key to handling and generating the data artifacts:

FactManager: Extracts structured facts from documents
QuestionExtractor: Generates diverse question-answer pairs
ExperimentManager: Creates and runs knowledge boundary experiments
RetrievalStrategies: Implements different context retrieval methods
Evaluator: Assesses model responses with customizable metrics
DataLabeller: Orchestrates human labelling process for validation of LLM evaluations.

License

This project is licensed under the Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
assets/images		assets/images
docs		docs
example		example
src/knowornot		src/knowornot
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock
write.sh		write.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KnowOrNot

Highlights

Installation

Quick Start

Key Features

LLM Provider

Processing of LLM responses

Experiment Types

Retrieval Strategies

Evaluation Framework

Architecture

License

About

Uh oh!

Releases

Packages

Contributors 3

Languages

govtech-responsibleai/KnowOrNot

Folders and files

Latest commit

History

Repository files navigation

KnowOrNot

Highlights

Installation

Quick Start

Key Features

LLM Provider

Processing of LLM responses

Experiment Types

Retrieval Strategies

Evaluation Framework

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages