Official implementation of "Latent-Space Verification for Self-Correcting LLMs", which introduces a novel approach to enhance the factual reliability of Large Language Models through embedded verification mechanisms in latent space.
Note: A ready-to-use verification-enhanced model can be downloaded from Hugging Face.
- Overview
- Key Features and Contributions
- Method
- Installation
- Quick Start
- Usage
- Experiments and Results
- Test Suite Usage
- Repository Structure
- Citation
- License
- Acknowledgments
Large Language Models (LLMs) excel at generating coherent text but often produce factual inaccuracies or hallucinations, limiting their reliability in critical applications. This repository introduces Latent-Space Verification, a novel approach that embeds verification mechanisms directly into the hidden layers of pre-trained transformers. By detecting and rectifying inaccuracies within latent representations before output generation, my method enhances factual accuracy while requiring minimal additional parameters.
- Latent-Space Verification: My mechanism enables models to identify and rectify inaccuracies within their latent representations before generating outputs.
- Parameter-Efficient Implementation: Using LoRA-style adapters with only a 0.08% parameter increase (6.3M parameters on a 7.6B model).
- Significant Performance Gains: Improves factual accuracy from 55.81% to 65.48% (nearly 10% absolute improvement).
- Architectural Innovations:
- Residual Verification Networks with a mixture-of-experts approach
- Uncertainty-weighted Bayesian corrections
- Hierarchical verification across different abstraction levels
- Extensive Analysis: Direct evidence of “thinking in latent space” through visualization of hidden state dynamics, attention patterns, and more.
My method introduces verification modules into specific layers of a frozen pre-trained language model. These modules:
- Intercept hidden states at strategic layers of the transformer
- Assess factual consistency of the internal representations
- Apply confidence-weighted corrections as needed
- Coordinate cross-layer verification to ensure global consistency
- Bayesian Verification Adapters: Produce uncertainty-weighted corrections to hidden states
- Residual Verification Networks: Use mixture-of-experts approach with dynamic routing
- Hierarchical Verification: Implements verification at token, phrase, and semantic levels
- Cross-Layer Verifier: Ensures consistency of representations across different layers
Tested with Python 3.11+ and PyTorch 2.5.1 (but should work with PyTorch 2.0.2+ as well).
# Clone this repository
git clone https://github.com/jacobwarren/Latent-Space-Verification-for-Self-Correcting-LLMs.git
cd Latent-Space-Verification-for-Self-Correcting-LLMs
# Create a conda environment with Python 3.11
conda create -n latent-verify python=3.11
conda activate latent-verify
# Install dependencies
pip install -r requirements.txt
from latent_verification.latent_verification import load_verification_model
from transformers import AutoTokenizer
# If using the downloaded model from Hugging Face:
# model = load_verification_model("jacobpwarren/Qwen2.5-7B-Latent_Verification")
# tokenizer = AutoTokenizer.from_pretrained("jacobpwarren/Qwen2.5-7B-Latent_Verification")
# Otherwise, load a local verification-enhanced model
model = load_verification_model("path/to/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/model")
# Generate text with verification enabled
prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
from transformers import AutoModelForCausalLM
from latent_verification.latent_verification import create_verification_model
# Load base model
base_model = AutoModelForCausalLM.from_pretrained("your/model/name")
# Add verification adapters (default: every 3rd layer)
verified_model = create_verification_model(
base_model=base_model,
adapter_locations=[2, 5, 8, 11, 14, 17, 20, 23], # Customize adapter locations
bottleneck_size=64,
enable_cross_layer=True
)
# Fine-tune just the verification components
# (Base model parameters remain frozen)
# ... your fine-tuning code here ...
# Save the verification-enhanced model
verified_model.save_pretrained("path/to/save")
For more advanced training with verification-specific loss functions, you can write a script similar to:
from transformers import TrainingArguments, Trainer
from latent_verification.latent_verification import (
VerificationLoss, create_verification_model
)
# Create model with verification components
model = create_verification_model("your/model/name")
# Create verification-aware trainer
class VerificationTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
outputs = model(**inputs)
# Use verification metrics for loss calculation
verification_metrics = getattr(outputs, "verification_metrics", None)
# Combine with regular loss
loss_fct = VerificationLoss(
task_loss_fn=lambda o, t: o.loss,
consistency_weight=0.1,
confidence_regularization_weight=0.05
)
loss = loss_fct(outputs, inputs["labels"], verification_metrics)
return (loss, outputs) if return_outputs else loss
# Training arguments
training_args = TrainingArguments(
output_dir="./verification_model",
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=1e-5,
num_train_epochs=3,
)
# Initialize trainer
trainer = VerificationTrainer(
model=model,
args=training_args,
train_dataset=your_dataset,
)
# Train
trainer.train()
For advanced verification capabilities, see enhanced_verification.py
:
from latent_verification.enhanced_verification import (
create_enhanced_verification_model,
BayesianVerificationAdapter,
HierarchicalVerifier,
ResidualVerificationNetwork
)
# Create a model with Bayesian uncertainty estimation
bayesian_model = create_enhanced_verification_model(
model_name_or_path="your/model/name",
verification_type="bayesian"
)
# Create a model with hierarchical verification
hierarchical_model = create_enhanced_verification_model(
model_name_or_path="your/model/name",
verification_type="hierarchical"
)
# Create a model with residual verification networks
residual_model = create_enhanced_verification_model(
model_name_or_path="your/model/name",
verification_type="residual"
)
# And so forth...
The analysis
folder provides specialized tools to visualize and analyze verification behavior:
# Example usage of confidence analysis
from analysis import 5-confidence_analyzer # or run as a script
# Or from code (hypothetically):
from analysis.confidence_analyzer import ConfidenceAnalyzer
confidence_analyzer = ConfidenceAnalyzer(
verified_model_path="path/to/verified/model",
base_model_name="original/model/name",
output_dir="confidence_analysis"
)
confidence_results = confidence_analyzer.analyze_truth_falsehood_confidence()
My experiments demonstrate that latent-space verification significantly improves factual accuracy across multiple benchmarks:
- Base Model: 55.81%
- Verification-Enhanced Model: 65.48%
- Absolute Improvement: 9.67%
With minimal parameter overhead: only a 0.08% parameter increase (6.3M additional parameters on a 7.6B model).
My analysis provides direct evidence for the "thinking in latent space" hypothesis:
- Verification layers create systematic transformations in embedding space
- Vector fields demonstrate directional correction patterns
- Confidence scores correlate with factual accuracy
- Token probability analysis shows shifts toward accurate completions
Key findings from my ablation studies:
- Cross-layer verification is critical (37% reduction in improvement when disabled)
- Learned confidence outperforms fixed confidence thresholds by 43%
- Layer placement significantly impacts performance (middle layers contribute most)
- Verification depth affects different error types (early layers catch syntactic errors, late layers address semantic/factual issues)
This repository provides a suite of scripts (numbered 1-
through 8-
) to train, evaluate, and analyze verification-enhanced models, plus an all-in-one evaluation_suite.py
. Below is a high-level guide. All scripts assume you have installed the required dependencies (see Installation).
- Train the model using the example training script in
examples/training_script.py
:
cd examples
python training_script.py \
--model Qwen/Qwen2.5-1.5B-Instruct \
--verification_type bayesian \
--output_dir verification_model \
--batch_size 1 \
--gradient_accumulation_steps 16 \
--epochs 1 \
--max_samples 49664
- Wraps your chosen model with verification adapters.
- Applies custom verification-based loss terms.
- Trains only the new verification components (base model is frozen).
-
Save the trained model (e.g. in
verification_model/model
). -
Run analysis scripts in the
analysis
directory. Typically:
cd ../analysis
# 1) Evaluate verification performance
python 1-verification_evaluation.py
# 2) Create parameter-matched baselines
python 2-create_parameter_matched_baselines.py \
--model_name Qwen/Qwen2.5-7B-Instruct \
--verification_model ../verification_model/model \
--adapter_output ./adapter_baseline \
--lora_output ./lora_baseline \
--mlp_output ./mlp_baseline \
--device cuda
# 3) Generate ablation models
python 3-generate_ablation_models.py \
--model_path ../verification_model/model \
--output_dir ./ablation_models \
--tokenizer_path Qwen/Qwen2.5-7B-Instruct
# 4) Generate benchmark data
python 4-generate_benchmark_data.py \
--categories all \
--samples 200 \
--output_dir ./verification_benchmarks
# 5) Run confidence analysis
python 5-confidence_analyzer.py \
--verified_model ../verification_model/model \
--base_model Qwen/Qwen2.5-7B-Instruct \
--output_dir ./confidence_analysis \
--device cuda \
--run_all
# 6) Analyze "latent thinking"
python 6-latent_thinking_analyzer.py \
--base_model Qwen/Qwen2.5-7B-Instruct \
--verified_model ../verification_model/model \
--output_dir ./latent_thinking_analysis \
--all
# 7) Analyze embedding dynamics
python 7-embedding_dynamics_analyzer.py \
--verified_model ../verification_model/model \
--base_model Qwen/Qwen2.5-7B-Instruct \
--output_dir ./embedding_dynamics \
--device cuda \
--high_quality \
--run_all
# 8) Perform hidden-state verification checks
python 8-hidden_state_verification.py
latent-space/
├── analysis/
│ ├── __init__.py
│ ├── 1-verification_evaluation.py
│ ├── 2-create_parameter_matched_baselines.py
│ ├── 3-generate_ablation_models.py
│ ├── 4-generate_benchmark_data.py
│ ├── 5-confidence_analyzer.py
│ ├── 6-latent_thinking_analyzer.py
│ ├── 7-embedding_dynamics_analyzer.py
│ ├── 8-hidden_state_verification.py
│ ├── evaluation_suite.py
│ └── verification_benchmarks/ # benchmark datasets for local use
├── examples/
│ ├── __init__.py
│ └── training_script.py
├── latent_verification/
│ ├── __init__.py
│ ├── detect_qwen_layers.py
│ ├── enhanced_verification.py
│ └── latent_verification.py
├── requirements.txt
└── README.md
If you use this work in your research, please cite my GitHub repository:
@misc{warren2025latent,
title={Latent-Space Verification for Self-Correcting LLMs},
author={Warren, Jacob},
year={2025},
publisher={GitHub},
journal={GitHub repository},
howpublished={\url{https://github.com/jacobwarren/Latent-Space-Verification-for-Self-Correcting-LLMs}}
}
Warren, J. (2025). Latent-Space Verification for Self-Correcting LLMs [Computer software]. GitHub. https://github.com/jacobwarren/Latent-Space-Verification-for-Self-Correcting-LLMs
Warren, Jacob. "Latent-Space Verification for Self-Correcting LLMs." GitHub, 2025, github.com/jacobwarren/Latent-Space-Verification-for-Self-Correcting-LLMs.
J. Warren, "Latent-Space Verification for Self-Correcting LLMs," GitHub, 2025. [Online]. Available: https://github.com/jacobwarren/Latent-Space-Verification-for-Self-Correcting-LLMs
This project is licensed under the MIT License - see the LICENSE file for details.
- This research was conducted independently.
- I thank the HuggingFace team for their transformer implementations.
- Experiments were conducted using Qwen 2.5 models at both 1.5B and 7B parameter scales.