GitHub - ofriw/chunkhound: Modern RAG for your codebase

Modern RAG for your codebase - semantic and regex search via MCP.

Transform your codebase into a searchable knowledge base for AI assistants using semantic search via cAST algorithm and regex search. Integrates with AI assistants via the Model Context Protocol (MCP).

Features

cAST Algorithm - Research-backed semantic code chunking
Semantic search - Natural language queries like "find authentication code"
Regex search - Pattern matching without API keys
Local-first - Your code stays on your machine
22 languages with structured parsing
- Programming (via Tree-sitter): Python, JavaScript, TypeScript, JSX, TSX, Java, Kotlin, Groovy, C, C++, C#, Go, Rust, Bash, MATLAB, Makefile
- Configuration (via Tree-sitter): JSON, YAML, TOML, Markdown
- Text-based (custom parsers): Text files, PDF
MCP integration - Works with Claude, VS Code, Cursor, Windsurf, Zed, etc

Documentation

Visit ofriw.github.io/chunkhound for complete guides:

Requirements

Python 3.10+
uv package manager
API key for semantic search (optional - regex search works without any keys)
- OpenAI | VoyageAI | Local with Ollama

Installation

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install ChunkHound
uv tool install chunkhound

Quick Start

Option 1: With Embeddings (Recommended)

Create .chunkhound.json in project root file

{
  "embedding": {
    "provider": "openai",
    "api_key": "your-api-key-here"
  }
}

Index your codebase

chunkhound index

Option 2: Without embeddings (regex search only)

chunkhound index --no-embeddings

For configuration, IDE setup, and advanced usage, see the documentation.

Why ChunkHound?

Research Foundation: Built on the cAST (Chunking via Abstract Syntax Trees) algorithm from Carnegie Mellon University, providing:

4.3 point gain in Recall@5 on RepoEval retrieval
2.67 point gain in Pass@1 on SWE-bench generation
Structure-aware chunking that preserves code meaning

Local-First Architecture:

Your code never leaves your machine
Works offline with Ollama local models
No per-token charges for large codebases

Universal Language Support:

Structured parsing for 22 languages (Tree-sitter + custom parsers)
Same semantic concepts across all programming languages

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 691 Commits
.claude/commands		.claude/commands
.github/workflows		.github/workflows
chunkhound		chunkhound
docs		docs
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MIGRATION_GUIDE.md		MIGRATION_GUIDE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-lock.txt		requirements-lock.txt
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features

Documentation

Requirements

Installation

Quick Start

Option 1: With Embeddings (Recommended)

Option 2: Without embeddings (regex search only)

Why ChunkHound?

License

About

Uh oh!

Releases 33

Packages

Contributors 3

Uh oh!

Languages

License

ofriw/chunkhound

Folders and files

Latest commit

History

Repository files navigation

Features

Documentation

Requirements

Installation

Quick Start

Option 1: With Embeddings (Recommended)

Option 2: Without embeddings (regex search only)

Why ChunkHound?

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 33

Packages 0

Contributors 3

Uh oh!

Languages

Packages