Key-Value Store

A high-performance key-value store built in Rust with Log-Structured Merge (LSM) tree architecture.

Features

In-memory memtable for fast writes (implemented as AVL tree)
Sorted String Tables (SST) on disk with multiple implementation options
- Array SST: simple sorted files
- B-tree SST: optimized for faster reads
Configurable buffer pool for page caching using extendible hash table
Bloom filters to optimize negative lookups
Multiple compaction policies:
- None: no compaction
- Leveled: compaction by level
- Tiered: compaction by size ratio
- Dostoevsky: hybrid policy using tiered for lower levels and leveled for highest level
Core operations:
- get: Retrieve a value by key
- put: Insert or update a key-value pair
- scan: Range scan for key-value pairs within a range
- delete: Remove a key-value pair

Installation

Make sure you have Rust and Cargo installed. Then:

git clone <repository-url>
cd key-value-store
cargo build --release

Usage

Basic Example

use key_value::db::Database;

// Open or create a database
let mut db = Database::open("my_database");

// Customize configuration (optional)
let db = db.set_memtable_capacity_mb(1)
          .set_buffer_pool_capacity_mb(10)
          .set_enable_bloom_filter(true)
          .set_bloom_filter_bits_per_entry(5);

// Basic operations
db.put(1, 100);                  // Insert key-value pair
let value = db.get(1);           // Retrieve value (returns Option<i64>)
let range = db.scan(1, 10);      // Scan for keys in range [1, 10]
db.delete(1);                    // Delete key

// Close database (or let it drop automatically)
db.close();

Configuration Options

The database is highly configurable with the following options:

Memtable Configuration

memtable_capacity(): Get maximum entries in memtable
set_memtable_capacity(capacity): Set max entries in memtable
set_memtable_capacity_mb(capacity_mb): Set max size in MB

SST Configuration

sst_size_ratio(): Get size ratio between SST levels
set_sst_size_ratio(ratio): Set size ratio for different levels
sst_implementation(): Get current SST implementation
set_sst_implementation(impl): Set implementation (Array or Btree)
sst_search_algorithm(): Get search algorithm for SSTs
set_sst_search_algorithm(algo): Set search algorithm (Default or BinarySearch)

Buffer Pool Configuration

enable_buffer_pool(): Check if buffer pool is enabled
set_enable_buffer_pool(enable): Enable/disable buffer pool
buffer_pool_capacity(): Get max entries in buffer pool
set_buffer_pool_capacity(capacity): Set max entries
set_buffer_pool_capacity_mb(capacity_mb): Set max size in MB
buffer_pool_initial_size(): Get initial size before expansion
set_buffer_pool_initial_size(size): Set initial size
set_buffer_pool_initial_size_mb(size_mb): Set initial size in MB

Compaction Configuration

compaction_policy(): Get current compaction policy
set_compaction_policy(policy): Set policy (None, Leveled, Tiered, Dostoevsky)

Bloom Filter Configuration

enable_bloom_filter(): Check if bloom filter is enabled
set_enable_bloom_filter(enable): Enable/disable bloom filter
bloom_filter_bits_per_entry(): Get bits per entry in bloom filter
set_bloom_filter_bits_per_entry(bits): Set bits per entry

Architecture

Overview

The database follows an LSM tree architecture with the following components:

Memtable: An in-memory AVL tree that stores recently written data
Sorted String Tables (SSTs): On-disk sorted files organized in levels
Buffer Pool: Caches frequently accessed disk pages
Bloom Filters: Probabilistic data structure to optimize negative lookups

Data Flow

put operations insert data into the memtable
When memtable reaches capacity, it's flushed to disk as an SST file
get operations search the memtable first, then SSTs from youngest to oldest
scan operations search all sources and merge results to provide ordered data
Compaction merges multiple SSTs to maintain performance

Implementation Details

Memtable (AVL Tree)

Implemented as a self-balancing AVL tree for efficient in-memory storage
Supports fast insertion, lookup, and in-order traversal operations
Trades slightly slower writes for marginally faster reads compared to red-black trees

SST Implementations

Array SST
- Key-value pairs stored contiguously in little-endian binary format
- Binary search used for lookups
- Simple implementation with good performance for smaller datasets
B-tree SST
- Separated into two files: one for leaf nodes containing entries, one for inner nodes
- Maximizes fanout by storing only delimiters in inner nodes
- Enables faster lookups by reducing tree height
- Designed as a complete B-tree for efficient navigation

Buffer Pool

Implemented using an extendible hash table for page caching
Uses xxHash64 for high-performance uniform hashing
Hybrid Clock/LRU eviction policy:
- Clock algorithm operates at the bucket level
- LRU implemented within buckets for efficient page replacement
- Accessed pages moved to the end of their bucket's list
- Evicts least recently used page from buckets with access bit 0

Bloom Filters

Probabilistic data structure for efficient negative lookups
Uses up to M*ln2 hash functions (M = bits per entry)
Implemented with xxHash64 using different seed values for each hash function
Enables skipping SST searches when a key definitely doesn't exist

Compaction

Merge sorting algorithm based on 2-pass external sorting
For B-trees, implements a scalable node-building algorithm that:
- Requires only one scan of the sorted file
- Uses $O(\text{levels} \times \text{fanout})$ memory
- Costs $O(N/B)$ I/Os to scan the file and $O(\text{inner nodes})$ I/Os to write
- Where $N$ is the number of entries and $B$ is the page size (number of entries per page)

Performance Considerations

Memtable size affects write amplification and flush frequency
Buffer pool size affects read performance for repeated queries
Bloom filter bits per entry affects false positive rate
Compaction policy affects read/write performance trade-offs
B-tree vs Array SST choice affects read performance especially for larger datasets

Running Tests

Run all tests:

cargo test

Run experiments:

cargo run --bin experiments --release

Limitations and Considerations

Changing certain database settings after data has been written may lead to undefined behavior:
- Changing bloom filter bits per entry after creation will lead to incorrect queries
- Changing SST implementation or disabling bloom filters may leave unused files
The database uses i64::MIN as a tombstone value for deleted keys
Database names must be valid directory names without whitespace

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
experiments		experiments
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Key-Value Store

Features

Installation

Usage

Basic Example

Configuration Options

Memtable Configuration

SST Configuration

Buffer Pool Configuration

Compaction Configuration

Bloom Filter Configuration

Architecture

Overview

Data Flow

Implementation Details

Memtable (AVL Tree)

SST Implementations

Buffer Pool

Bloom Filters

Compaction

Performance Considerations

Running Tests

Limitations and Considerations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AndyKaySum/key-value-store

Folders and files

Latest commit

History

Repository files navigation

Key-Value Store

Features

Installation

Usage

Basic Example

Configuration Options

Memtable Configuration

SST Configuration

Buffer Pool Configuration

Compaction Configuration

Bloom Filter Configuration

Architecture

Overview

Data Flow

Implementation Details

Memtable (AVL Tree)

SST Implementations

Buffer Pool

Bloom Filters

Compaction

Performance Considerations

Running Tests

Limitations and Considerations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages