Modular Implementation of Search-Augmented Factuality Evaluator

## Overview

I've implemented a modular Python package for evaluating the factuality of LLM responses based on the research paper "Long-form factuality in large language models" by Wei et al. (2024). This package (SAF-Eval) breaks down responses into atomic facts and evaluates them against retrieved evidence to provide detailed factuality scoring.

https://github.com/chandralegend/saf-eval

## Features implemented

- 🔍 **Atomic Fact Extraction**: Extracts atomic factual claims from responses with optional few-shot learning
- ✅ **Self-Containment Processing**: Automatically detects and fixes non-self-contained facts
- 🧠 **Flexible Retrieval**: Integrates with any document retrieval system (with a simple reference implementation)
- 🔄 **Fact Deduplication**: Eliminates redundant or highly similar facts
- 📊 **Comprehensive Evaluation**: Classifies facts and calculates factuality scores
- 📝 **Detailed Logging**: Full pipeline logging for analysis and debugging
- 🧩 **Modular Architecture**: All components can be customized or replaced

## Technical details

- Fully async implementation
- Provider-agnostic (works with any LLM)
- Supports the new OpenAI structured output format
- Includes a unified configuration system
- Comes with comprehensive test suite
- Includes example scripts for all key features

This implementation follows the methodology outlined in the cited paper where responses are evaluated by:
1. Breaking them down into atomic facts
2. Ensuring each fact is self-contained
3. Deduplicating similar facts
4. Retrieving evidence for each fact
5. Classifying facts based on evidence
6. Calculating an overall factuality score

I'd appreciate feedback on the implementation and would welcome any suggestions for improvements or additional features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modular Implementation of Search-Augmented Factuality Evaluator #20

Overview

Features implemented

Technical details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Modular Implementation of Search-Augmented Factuality Evaluator #20

Description

Overview

Features implemented

Technical details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions