Skip to content

ashioyajotham/Value-Aligned-Confabulation-VAC-Research

Repository files navigation

Value-Aligned Confabulation (VAC) Research

Overview

This repository contains the implementation of research on Value-Aligned Confabulation (VAC) - a novel approach to evaluating LLM outputs that distinguishes between harmful "hallucination" and beneficial "hallucination" that aligns with human values.

Core Concept

Traditional LLM evaluation treats all factually ungrounded outputs as equally problematic "hallucinations." VAC research proposes that:

  • Harmful Hallucination: Factually incorrect outputs that mislead or cause harm
  • Value-Aligned Confabulation: LLM outputs that are factually ungrounded but align with human values and serve beneficial purposes
  • Truthfulness-Utility Trade-off: The balance between factual accuracy and beneficial outcomes

Key Research Questions

  1. Can LLMs learn to confabulate in ways that align with human values?
  2. How do we measure the alignment between beneficial confabulation and truthfulness?
  3. What contextual factors determine when confabulation becomes harmful vs. helpful?

Repository Structure

value-aligned-confabulation/
├── docs/                    # Research documentation
├── src/                     # Core implementation
│   ├── evaluation/         # Evaluation framework
│   ├── data/               # Data collection and management
│   ├── models/             # Model implementations
│   └── analysis/           # Analysis tools
├── experiments/            # Experimental protocols
├── tests/                  # Testing framework
├── configs/                # Configuration files
└── scripts/                # Utility scripts

Installation

pip install -r requirements.txt
python setup.py install

Quick Start

from src.evaluation.vac_evaluator import ValueAlignedConfabulationEvaluator

evaluator = ValueAlignedConfabulationEvaluator()
score = evaluator.evaluate_response(prompt, response, context)

Web UI (Streamlit) for Value Elicitation

Prefer a friendlier interface? Launch the Streamlit app:

# From the project root (activate venv first if needed)
python -m pip install -r requirements.txt
streamlit run experiments\pilot_studies\streamlit_app.py

The app collects demographics, shows scenario pairs with styled cards, and saves:

  • JSON bundle with analysis
  • JSONL rows (one per recorded choice)
  • CSV table

Files are written to experiments/results/value-elicitation_streamlit/<DATE>/.

Research Phases

Phase 1: Foundation (Weeks 1-2)

  • Core evaluation framework
  • Initial benchmark scenarios
  • Basic metrics implementation

Phase 2: Human Studies (Weeks 3-4)

  • Value elicitation study
  • Expert judgment collection
  • Baseline human preferences

Phase 3: Model Evaluation (Weeks 5-6)

  • Baseline model evaluation
  • Cross-domain testing
  • Alignment-truthfulness trade-off analysis

Phase 4: Analysis & Iteration (Weeks 7-8)

  • Statistical analysis
  • Metric refinement
  • Research publication preparation

Contributing

This is a research project focused on advancing our understanding of beneficial AI confabulation. We welcome contributions from researchers, developers, and AI safety practitioners.

Ways to Contribute

  • Research: New evaluation metrics, benchmark scenarios, human study protocols
  • Technical: Code improvements, integrations, analysis tools
  • Documentation: Methodology improvements, examples, tutorials
  • Community: Cross-cultural validation, expert reviews, ethical guidelines

Please see our Contributing Guide for detailed information on how to get involved.

Research Ethics

This project follows ethical guidelines for human subjects research and AI safety. All contributions should consider potential societal impacts and promote beneficial uses of confabulation research.

Acknowledgements

This research builds upon important insights from the AI research community:

Terminology

  • Geoffrey Hinton has advocated for using "confabulation" rather than "hallucination" when describing AI-generated content that isn't grounded in training data, emphasizing that the term better captures the nature of how language models generate responses. See his discussion in the 60 Minutes interview and the full interview.

  • Andrej Karpathy has discussed the nuanced nature of what we call "hallucinations" in language models, noting that not all factually ungrounded outputs are equally problematic - a key insight that motivates this research. His thoughts on this topic have been shared in various Twitter/X discussions.

Foundational Work

Research Community

We acknowledge the broader AI safety and alignment research community, whose ongoing work on AI evaluation, human preference modeling, and value alignment provides the foundation for this research.

License

MIT License - See LICENSE file for details.

Citation

If you use this work in your research, please cite:

@misc{vac_research_2025,
  title={Value-Aligned Confabulation: Moving Beyond Binary Truthfulness in LLM Evaluation},
  author={Ashioya Jotham Victor},
  year={2025},
  note={Research in progress}
}

About

Research framework for evaluating value-aligned confabulation in LLMs - distinguishing beneficial speculation from harmful hallucination.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages