Reducing Hallucinations in LLMs using Prompt Engineering Strategies

A comprehensive study on reducing hallucinations in Large Language Models through strategic prompt engineering techniques.

📋 Table of Contents

Overview
Problem Statement
Methodology
Results
Installation
Prompt Strategies
Evaluation Framework
Contributing
Citation
Team

🔍 Overview

Large Language Models (LLMs) like GPT-4, Claude, and Mistral demonstrate remarkable capabilities but suffer from hallucinations — generating factually incorrect or fabricated content with high confidence. This project explores prompt engineering strategies as a lightweight, non-invasive method to reduce hallucinations in open-domain question answering tasks.

Key Achievements

🎯 15% improvement in F1 score using hybrid prompting strategies
📉 Significant reduction in hallucination rates across multiple LLMs
🔧 Zero-shot approach requiring no model retraining or fine-tuning
📊 Comprehensive evaluation across 4+ different LLM architectures

🚨 Problem Statement

Despite advances in model scaling and fine-tuning, LLMs continue to produce hallucinations that:

Reduce user trust in AI systems
Create risks in critical domains (healthcare, education, legal)
Limit practical deployment in production environments
Generate confident but incorrect responses

Our Solution: Develop prompt engineering strategies that improve factual accuracy without requiring access to model internals or expensive retraining.

🧰 Methodology

Dataset

Curated QA dataset spanning general knowledge, science, and abstract reasoning
Balanced across different difficulty levels and domains
Manual verification of ground truth answers

Models Evaluated

GPT-3.5-turbo (baseline)
Mistral-7B Instruct (open-weight)

Prompt Strategies

Strategy	Description	Key Feature
Vanilla	Direct question-answer format	Baseline approach
Chain-of-Thought (CoT)	Step-by-step reasoning	Encourages logical flow
Chain-of-Verification (CoVe)	Self-verification stage	Validates answers
Hybrid (CoT + CoVe)	Combined reasoning + verification	Best performance

📊 Results

Performance Comparison

Prompt Strategy	F1 Score ↑	Recall ↑	Hallucination Rate ↓
Vanilla QA	0.05	0.03	High
Chain-of-Thought	0.11	0.08	Moderate
Chain-of-Verification	0.13	0.10	Low
Hybrid (CoT + CoVe)	0.15	0.10	Lowest

Key Findings

✅ Hybrid prompting consistently outperformed individual strategies
✅ GPT-4 showed best absolute performance with structured prompts
✅ Open-source models (Mistral-7B) demonstrated significant improvements
✅ Reasoning quality improved alongside factual accuracy

🚀 Installation

# Clone the repository
git clone [https://github.com/meghajbhat/llm-hallucination-reduction.git](https://github.com/meghajbhat/Reducing-Hallucinations-in-LLMs-using-Prompt-Engineering-Strategies.git)
cd Reducing-Hallucinations-in-LLMs-using-Prompt-Engineering-Strategies


# Install dependencies
We have done this project in Kaggle using GPU P100.

### Requirements

openai>=1.0.0 anthropic>=0.8.0 transformers>=4.30.0 torch>=2.0.0 pandas>=1.5.0 numpy>=1.21.0 scikit-learn>=1.0.0 matplotlib>=3.5.0 seaborn>=0.11.0 tqdm>=4.64.0


## 🎯 Prompt Strategies

### Hybrid Prompt Template

```python
HYBRID_TEMPLATE = """
Question: {question}

Let's think step-by-step:
1. [Reasoning step 1]
2. [Reasoning step 2]
3. [Reasoning step 3]

Now let me verify this answer:
- Does this make logical sense? 
- Are there any contradictions?
- Is this consistent with known facts?

Final Answer: [Verified answer]
"""

📈 Evaluation Framework

Metrics

F1 Score: Harmonic mean of precision and recall
Recall: Coverage of relevant information
Hallucination Rate: Frequency of factually incorrect statements
Explanation Coherence: Alignment between reasoning and final answer
Accuracy
Precision

🤝 Contributing

We welcome contributions!

📚 Citation

If you use this work in your research, please cite!

👥 Team

Name	Role
Megha Bhat	[PES1UG22CS344
Hrishita Patra	PES1UG22CS241
Jahnavi Bobba	PES1UG22CS246
Keerthi K	PES1UG22CS284

Supervisor: Dr. Surabhi Narayan, PES University, Bengaluru

🔗 Related Work

📞 Contact

For questions or collaboration opportunities:

📧 Email: meghajbhat@gmail.com
💼 LinkedIn: [Megha Bhat](https://www.linkedin.com/in/megha-bhat-20baaa293/)

⭐ Star this repository if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CODES		CODES
Kaggle Notebook		Kaggle Notebook
Output screenshots.docx		Output screenshots.docx
README.md		README.md
Research Paper.pdf		Research Paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reducing Hallucinations in LLMs using Prompt Engineering Strategies

📋 Table of Contents

🔍 Overview

Key Achievements

🚨 Problem Statement

🧰 Methodology

Dataset

Models Evaluated

Prompt Strategies

📊 Results

Performance Comparison

Key Findings

🚀 Installation

📈 Evaluation Framework

Metrics

🤝 Contributing

📚 Citation

👥 Team

🔗 Related Work

📞 Contact

About

Uh oh!

Releases

Packages

Languages

meghajbhat/Reducing-Hallucinations-in-LLMs-using-Prompt-Engineering-Strategies

Folders and files

Latest commit

History

Repository files navigation

Reducing Hallucinations in LLMs using Prompt Engineering Strategies

📋 Table of Contents

🔍 Overview

Key Achievements

🚨 Problem Statement

🧰 Methodology

Dataset

Models Evaluated

Prompt Strategies

📊 Results

Performance Comparison

Key Findings

🚀 Installation

📈 Evaluation Framework

Metrics

🤝 Contributing

📚 Citation

👥 Team

🔗 Related Work

📞 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages