Distributed Key-Value Store

A lightweight, self‑contained, Go implementation of a distributed key‑value store that can run in two modes:

Standalone — single‑node, embedded WAL + embedded LSM engine (RoseDB).
Cluster — sharded across writers, each with synchronous read‑replicas, automatic resharding and fail‑over coordinated through ZooKeeper.

The project is designed for experimentation around consensus‑less replication, slot‑based sharding and high availability.

Disclaimer

Not production ready. The codebase is a research project. Use at your own risk.

Quickstart (Simulate cluster & test via CLI)

1. Prerequisites

Go >= 1.22
ZooKeeper running — easiest is Docker:
```
docker-compose up -d
```

2. Start simulated cluster

This will start one writer instance, and one read‑replica, all running in the same process.

go run examples/cluster/main.go

3. Build the CLI

make build-cli

4. Run CLI

./bin/cli localhost:17000

Design & Trade‑offs

1. Partitioning

Uses hash slots as a partitioning scheme, similar to Redis.
1024 fixed slots per key‑space (CRC‑32 % 1024) keeps hashing cheap while allowing dynamic range assignment.
Writers own disjoint slot ranges; adding a new writer triggers online resharding that copies WAL segments for its range.
Trade‑off: fixed slot count simplifies look‑ups but caps horizontal scalability.

2. Replication & Consistency

Synchronous quorum replication (2‑phase Prepare / Commit over RPC) per write.
Parameters: replicationFactor (N) & writeQuorum (Q). A write succeeds when Q ≤ N replicas ACK the commit.
Consistency model: Read‑your‑writes on writers; eventual on replicas.
Trade‑off: to avoid complexity it sacrifices linearizability during fail‑over.

3. Storage

WAL first, LSM later: every mutation is encoded Op|Key|Value and appended to a per‑node WAL before being applied to RoseDB.
Abstracted FileSystem interface lets you swap disk for in‑memory mocks in tests.
Trade‑off: single WAL file per node limits parallelism; compaction logic delegated to RoseDB.

4. Coordination

ZooKeeper stores cluster metadata (node IDs, slot ranges, status) and provides watch‑based notifications.
Each node exposes an RPC server (CurrentOffset, Prepare, Commit, etc.).
Trade‑off: external dependency simplifies service discovery but introduces operational overhead.

5. Fault Tolerance & Recovery

Writers promote replicas using a best‑effort election if the writer goes down and quorum is still reachable.
WAL offset checks + rollback paths guard against partial commits when a replica dies mid‑transaction.
Trade‑off: no long‑running leader election; transient split‑brain possible under extreme network partitions.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
bin		bin
cmd		cmd
dkv		dkv
engine		engine
examples/cluster		examples/cluster
httpsrv		httpsrv
internal		internal
slots		slots
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
example.http		example.http
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distributed Key-Value Store

Disclaimer

Quickstart (Simulate cluster & test via CLI)

Design & Trade‑offs

1. Partitioning

2. Replication & Consistency

3. Storage

4. Coordination

5. Fault Tolerance & Recovery

About

Uh oh!

Languages

tarcisiozf/dkv

Folders and files

Latest commit

History

Repository files navigation

Distributed Key-Value Store

Disclaimer

Quickstart (Simulate cluster & test via CLI)

Design & Trade‑offs

1. Partitioning

2. Replication & Consistency

3. Storage

4. Coordination

5. Fault Tolerance & Recovery

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages