Skip to content

Lightweight key-value based storage engine, supporting indexed data access through B+ trees written in c++ from scratch

Notifications You must be signed in to change notification settings

theweird-kid/pebbleDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PebbleDB

PebbleDB is a lightweight key-value storage engine written in C++, designed for learning and exploring the internals of databases and storage systems. It focuses on core components like B+ Trees, buffer pool management, persistence, and concurrency — similar to the storage layer of modern databases like PostgreSQL, RocksDB, and others.

⚠️ This is a learning-oriented project, not intended for production use.


🎯 Goals

  • ✅ Build a custom key-value store from scratch in C++.
  • ✅ Understand and implement B+ Trees, buffer pool, and page-based file storage.
  • ✅ Learn about persistent data storage, page layout, and heap file management.
  • ✅ Explore concurrent-safe reads and writes with coarse or fine-grained locking.
  • ✅ Dive into log-based durability, crash recovery, and eventually replication.
  • ✅ Keep the codebase modular and extensible for experimentation.

🧱 Architecture Overview

  • Heap File – Stores variable-length key-value records with slotted page layout.
  • B+ Tree Index – Used for fast lookup, insertion, and range queries over keys.
  • Buffer Pool – Caches pages in memory with LRU-style eviction and dirty page flushing.
  • Catalog Manager – Tracks metadata like root pages and collection info.
  • Write-Ahead Log (WAL) – Persists operations to disk before applying them.
  • Concurrency Layer (planned) – Reader-writer locking for thread-safe operations.

📈 Development Phases

✅ Phase 1 — Core Infrastructure

  • Page abstraction and file-backed heap file.
  • Slotted page layout for variable-length key-value records.
  • Basic B+ Tree index (with insert, search).
  • Metadata (Catalog) page to track collections and root nodes.

🔄 Phase 2 — Persistence & WAL

  • Page-level Write-Ahead Logging (WAL) with redo logging.
  • WAL replay and recovery on startup.
  • Configurable flush and checkpointing mechanism.

🧠 Phase 3 — Concurrency & Safety

  • Coarse-grained locks for concurrent-safe read/write.
  • Page-level latches or record-level locks.
  • Plan for future MVCC-based reads (optional).

📦 Phase 4 — Usability & Extensions

  • Add a CLI or C++ client interface to use the key-value store.
  • Range scan API with B+ Tree iterator.
  • Add unit and stress tests for concurrency, recovery, and crash simulation.

🔧 Getting Started

🛠 Build (Linux/macOS/WSL)

mkdir build
cd build
cmake -G Ninja -DCMAKE_CXX_COMPILER=clang++ ..
ninja

About

Lightweight key-value based storage engine, supporting indexed data access through B+ trees written in c++ from scratch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published