Veritas: A Scientist for Autonomous Research

One of the grand challenges of artificial intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have been used to assist human scientists—for example, in brainstorming ideas or writing code—they still require extensive manual supervision or are constrained to narrow, task-specific use cases.

Veritas is a comprehensive system for fully automatic scientific discovery, enabling Foundation Models such as Large Language Models (LLMs) to perform research independently. It runs locally on Mistral 7B, ensuring full data privacy, minimizing citation hallucinations through Retrieval-Augmented Generation (RAG), and supporting customizable scientific writing styles via QLoRA (Quantized Low-Rank Adaptation). Veritas also integrates LongLoRA for context extension, allowing input windows of over 100,000 tokens to support long-form research workflows.

Veritas was developed during Major League Hacking’s Global Hack Week: Open Source (May 9–15, 2025). At the end of the week, MLH ranked me among the top 1% of participating hackers. I am convinced that in the coming years, tools like Veritas will evolve significantly and drive a paradigm shift, playing a leading role in the production of scientific knowledge.

Research Outputs

Machine learning research papers across a range of emerging topics, including diffusion modeling, language generation, and grokking dynamics:

Note: While all core modules of Veritas have been validated, a production-grade RAG pipeline is still under development. The papers currently showcased were generated by the AI Scientist.

Research Workflow

Veritas mirrors the architecture of The AI Scientist (GPT4) and implements the full research pipeline:

1. Idea Generation

Receives a topic template
Brainstorms novel research directions
Validates novelty using Semantic Scholar

2. Experimental Iteration

Executes code for proposed methods
Collects outputs and visualizations
Annotates each result for interpretation

3. Paper Write-up

Generates a LaTeX-formatted scientific paper
Autonomously sources relevant citations

4. Automated Peer Review

Uses a custom LLM reviewer aligned with ML conference standards
Evaluates novelty, clarity, rigor
Feeds back into the system for future iterations

System Requirements

Apple Silicon Mac (M1, M2, M3, or M4)
macOS Monterey or later
16GB RAM minimum (32GB+ recommended, 128GB optimal for M4)
8GB+ free storage (SSD recommended)
Python 3.9 or higher

Installation

We provide a unified installation script that handles everything for you:

# Clone the repository
git clone https://github.com/yourusername/veritas.git
cd veritas

# Basic installation (using convenience script)
./install.sh

# Or directly from tools directory
python tools/install.py

# To also download the Mistral model (optional, 13GB+)
python tools/install.py --download-model

# More installation options
python tools/install.py --upgrade                 # Upgrade existing dependencies
python tools/install.py --ignore-errors           # Continue even if some steps fail
python tools/install.py --skip-dependencies       # Skip installing dependencies
python tools/install.py --model "mistralai/Mistral-7B-v0.2"  # Specify model to download

The installation script:

Creates necessary directories
Installs all dependencies for both RAG and AI Scientist
Sets up the package for development
Creates basic research templates for AI Scientist
Optionally downloads the Mistral model

After installation, you can use the command-line tools:

# Use main interface
veritas

# Use AI Scientist directly
veritas-ai-scientist

# See all available options
veritas --help

Manual Installation

If you prefer manual installation:

Clone the repository:

git clone https://github.com/yourusername/veritas.git
cd veritas

Install dependencies:
```
pip install -r requirements.txt
```
Install the package:
```
pip install -e .
```

Download and prepare the Mistral model (if needed):

mkdir -p models/mistral
python -c "from huggingface_hub import snapshot_download; snapshot_download('mistralai/Mistral-7B-v0.2', local_dir='models/mistral')"

Quick Start

Run the unified terminal interface:

# Start with RAG system (default)
python scripts/run.py

# Start with AI Scientist
python scripts/run.py --system ai_scientist

# Show all options
python scripts/run.py --help

Using the RAG System

The RAG system allows you to ask questions about your documents:

python scripts/run.py

This will start the RAG system with the terminal UI, where you can directly ask questions.

Using AI Scientist

To use the AI Scientist component:

# Direct launch
python scripts/run.py --system ai_scientist

# Or start with RAG and switch
python scripts/run.py
# Then type 'scientist' at the prompt

Or run a simple test:

# Navigate to the AI Scientist directory
cd src/veritas/ai_scientist

# Simple test that generates one idea
python test_simple.py

For more information, see the AI Scientist README.

Architecture

Veritas is designed with a clear separation of concerns:

Core RAG Implementation (src/veritas/rag.py): The heart of the system that handles retrieval and generation
Application Layer (scripts/run.py): Configures and uses the core RAG system for specific use cases
Configuration (src/veritas/config.py): Centralized settings for the entire system
Apple Silicon Optimizations (src/veritas/mps_utils.py): Specialized utilities for Apple's Metal framework
Text Processing (src/veritas/chunking.py): Document segmentation for efficient indexing and retrieval
AI Scientist (src/veritas/ai_scientist): Research assistant built on top of our RAG system

UML Class Diagram

┌─────────────┐     ┌───────────────┐
│ MistralModel│     │   RAGSystem   │
│ (run.py)    │────>│  (rag.py)     │
└─────────────┘     └───────────────┘
       │                   │
       │                   │
       ▼                   ▼
┌─────────────┐     ┌───────────────┐
│ ModelConfig │     │    Config     │
└─────────────┘     └───────────────┘
                           │
                           ▼
                    ┌───────────────┐
                    │  mps_utils    │
                    └───────────────┘

Core Components

RAGSystem (src/veritas/rag.py)

The main class that implements the RAG functionality:

from veritas import RAGSystem

# Create a RAG system
rag = RAGSystem(
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",
    llm_model="models/mistral-7b",
    index_path="models/faiss",
    device="mps"  # Use Apple Silicon acceleration
)

# Generate a complete RAG response
response = rag.generate_rag_response(
    query="How does a RAG system work?",
    top_k=5,  # Number of chunks to retrieve
    max_new_tokens=200
)

print(response["combined_response"])

MistralModel (scripts/run.py)

A wrapper around RAGSystem that handles configuration and initialization:

from src.veritas.config import Config
from scripts.run import MistralModel, ModelConfig

# Configure the model
config = ModelConfig(
    model_name=Config.LLM_MODEL,
    max_new_tokens=200,
    temperature=0.3,
    max_retrieved_chunks=3
)

# Create and load model
model = MistralModel(config)
model.load()

# Generate a response with context
context, direct_response, combined_response = model.generate(
    "What are the advantages of RAG systems over pure LLMs?"
)

AI Scientist (src/veritas/ai_scientist)

A research assistant built on top of our Mistral model with RAG capabilities:

from src.veritas.ai_scientist.run_scientist import AIScientist

# Create an AI Scientist instance
scientist = AIScientist(
    experiment="nanoGPT_lite", 
    num_ideas=1
)

# Generate research ideas
ideas = scientist.generate_ideas()

# Print the generated ideas
for idea in ideas:
    print(f"Idea: {idea['title']}")
    print(f"Description: {idea['description']}")
    print(f"Novelty: {idea['novelty_score']}")

Advanced Usage

Custom Document Chunking

from veritas import chunk_text, get_chunk_size

# Get optimal chunk size based on document length
document_length = len(large_document)
chunk_size = get_chunk_size(document_length, target_chunks=20)

# Generate chunks with custom parameters
chunks = chunk_text(
    text=large_document,
    chunk_size=chunk_size,
    overlap=100  # Words of overlap between chunks
)

# Process each chunk
for i, chunk in enumerate(chunks):
    print(f"Chunk {i}: {chunk[:50]}...")

Memory Optimization

from veritas.mps_utils import optimize_memory_for_m4, clear_mps_cache

# Apply comprehensive M4 optimizations at startup
optimize_memory_for_m4()

# Clear cache after heavy operations
result = model.generate(complex_query)
clear_mps_cache()  # Free up GPU memory

Switching Between RAG and AI Scientist

The unified interface allows you to switch between modes during a session:

# Start with RAG
python scripts/run.py

# Type 'scientist' at the prompt to switch to AI Scientist mode
# Select option 4 to return to RAG mode

Performance Optimization

Veritas includes several optimizations for Apple Silicon:

MPS Acceleration: Uses Metal Performance Shaders for faster computation
Memory Management: Carefully controls memory usage to prevent OOM errors
Half-Precision: Uses FP16 where possible for better performance
Caching Control: Explicit cache clearing to prevent memory leaks
SSD Offloading: Uses SSD for temporary files to reduce RAM pressure

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
config		config
docs		docs
models		models
results		results
scripts		scripts
src/veritas		src/veritas
tools		tools
.gitignore		.gitignore
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Veritas: A Scientist for Autonomous Research

Research Outputs

Research Workflow

1. Idea Generation

2. Experimental Iteration

3. Paper Write-up

4. Automated Peer Review

System Requirements

Installation

Manual Installation

Quick Start

Using the RAG System

Using AI Scientist

Architecture

UML Class Diagram

Core Components

RAGSystem (src/veritas/rag.py)

MistralModel (scripts/run.py)

AI Scientist (src/veritas/ai_scientist)

Advanced Usage

Custom Document Chunking

Memory Optimization

Switching Between RAG and AI Scientist

Performance Optimization

About

Uh oh!

Uh oh!

Languages

matiasrodlo/veritas

Folders and files

Latest commit

History

Repository files navigation

Veritas: A Scientist for Autonomous Research

Research Outputs

Research Workflow

1. Idea Generation

2. Experimental Iteration

3. Paper Write-up

4. Automated Peer Review

System Requirements

Installation

Manual Installation

Quick Start

Using the RAG System

Using AI Scientist

Architecture

UML Class Diagram

Core Components

RAGSystem (src/veritas/rag.py)

MistralModel (scripts/run.py)

AI Scientist (src/veritas/ai_scientist)

Advanced Usage

Custom Document Chunking

Memory Optimization

Switching Between RAG and AI Scientist

Performance Optimization

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages