This repository contains the implementation of research on Value-Aligned Confabulation (VAC) - a novel approach to evaluating LLM outputs that distinguishes between harmful "hallucination" and beneficial "hallucination" that aligns with human values.
Traditional LLM evaluation treats all factually ungrounded outputs as equally problematic "hallucinations." VAC research proposes that:
- Harmful Hallucination: Factually incorrect outputs that mislead or cause harm
- Value-Aligned Confabulation: LLM outputs that are factually ungrounded but align with human values and serve beneficial purposes
- Truthfulness-Utility Trade-off: The balance between factual accuracy and beneficial outcomes
- Can LLMs learn to confabulate in ways that align with human values?
- How do we measure the alignment between beneficial confabulation and truthfulness?
- What contextual factors determine when confabulation becomes harmful vs. helpful?
value-aligned-confabulation/
├── docs/                    # Research documentation
├── src/                     # Core implementation
│   ├── evaluation/         # Evaluation framework
│   ├── data/               # Data collection and management
│   ├── models/             # Model implementations
│   └── analysis/           # Analysis tools
├── experiments/            # Experimental protocols
├── tests/                  # Testing framework
├── configs/                # Configuration files
└── scripts/                # Utility scripts
pip install -r requirements.txt
python setup.py installfrom src.evaluation.vac_evaluator import ValueAlignedConfabulationEvaluator
evaluator = ValueAlignedConfabulationEvaluator()
score = evaluator.evaluate_response(prompt, response, context)Prefer a friendlier interface? Launch the Streamlit app:
# From the project root (activate venv first if needed)
python -m pip install -r requirements.txt
streamlit run experiments\pilot_studies\streamlit_app.pyThe app collects demographics, shows scenario pairs with styled cards, and saves:
- JSON bundle with analysis
- JSONL rows (one per recorded choice)
- CSV table
Files are written to experiments/results/value-elicitation_streamlit/<DATE>/.
- Core evaluation framework
- Initial benchmark scenarios
- Basic metrics implementation
- Value elicitation study
- Expert judgment collection
- Baseline human preferences
- Baseline model evaluation
- Cross-domain testing
- Alignment-truthfulness trade-off analysis
- Statistical analysis
- Metric refinement
- Research publication preparation
This is a research project focused on advancing our understanding of beneficial AI confabulation. We welcome contributions from researchers, developers, and AI safety practitioners.
- Research: New evaluation metrics, benchmark scenarios, human study protocols
- Technical: Code improvements, integrations, analysis tools
- Documentation: Methodology improvements, examples, tutorials
- Community: Cross-cultural validation, expert reviews, ethical guidelines
Please see our Contributing Guide for detailed information on how to get involved.
This project follows ethical guidelines for human subjects research and AI safety. All contributions should consider potential societal impacts and promote beneficial uses of confabulation research.
This research builds upon important insights from the AI research community:
- 
Geoffrey Hinton has advocated for using "confabulation" rather than "hallucination" when describing AI-generated content that isn't grounded in training data, emphasizing that the term better captures the nature of how language models generate responses. See his discussion in the 60 Minutes interview and the full interview. 
- 
Andrej Karpathy has discussed the nuanced nature of what we call "hallucinations" in language models, noting that not all factually ungrounded outputs are equally problematic - a key insight that motivates this research. His thoughts on this topic have been shared in various Twitter/X discussions. 
- This research was originally conceptualized in "Hallucinations in Large Language Models" (Ashioya, 2024), which explored the need for more nuanced evaluation of AI-generated content.
We acknowledge the broader AI safety and alignment research community, whose ongoing work on AI evaluation, human preference modeling, and value alignment provides the foundation for this research.
MIT License - See LICENSE file for details.
If you use this work in your research, please cite:
@misc{vac_research_2025,
  title={Value-Aligned Confabulation: Moving Beyond Binary Truthfulness in LLM Evaluation},
  author={Ashioya Jotham Victor},
  year={2025},
  note={Research in progress}
}