ShorterDB is a lightweight, embedded key-value store inspired by popular databases like RocksDB and LevelDB. It is designed to provide a simple and extensible architecture for learning and experimentation. While it may not match the performance of production-grade systems, it offers a clear and modular implementation of key-value store concepts, including Write-Ahead Logging (WAL), Memtables, and Sorted String Tables (SSTs).
- Introduction
- Installation
- Features
- Examples
- Code Walkthrough
- Limitations
- Future Work
- Architecture Overview
- Conclusion
- Contributing
To use ShorterDB in your Rust project, add the following to your Cargo.toml
:
[dependencies]
shorterdb = "0.1.0"
For building the project locally, ensure you have Rust installed. Clone the repository and run:
git clone https://github.com/your-repo/shorterdb.git
cd shorterdb
cargo build
ShorterDB is a simple key-value store built using a De-LSM architecture. It is designed for educational purposes and provides a modular implementation of database components. The project includes examples for embedded usage, gRPC-based remote access, and CSV imports.
- Embedded Database: Use ShorterDB as a lightweight, file-based key-value store.
- gRPC Server: Access the database remotely using gRPC.
- REPL Interface: Interact with the database in a command-line interface.
- Write-Ahead Logging (WAL): Ensure durability by logging all writes.
- Memtable: An in-memory data structure for fast reads and writes.
- Sorted String Table (SST): Persistent storage for key-value pairs.
The embedded
example demonstrates how to use ShorterDB as an embedded database.
let mut db = ShorterDB::new(Path::new("./embedded_db")).unwrap();
db.set(b"hello", b"world").unwrap();
let value = db.get(b"hello").unwrap();
assert_eq!(value, Some(b"world".to_vec()));
The grpc
example provides a gRPC interface for remote database access.
#[tonic::async_trait]
impl Basic for DbOperations {
async fn get(&self, request: tonic::Request<GetRequest>) -> Result<tonic::Response<GetResponse>, tonic::Status> {
let key = request.get_ref().key.clone();
let db = self.db.lock().await;
match db.get(key.as_bytes()) {
Ok(Some(value)) => Ok(tonic::Response::new(GetResponse { value: String::from_utf8(value).unwrap() })),
Ok(None) => Err(tonic::Status::not_found("Key not found")),
Err(_) => Err(tonic::Status::internal("Error reading from the database")),
}
}
}
The repl_csv
example imports data from a CSV file and provides a REPL interface.
let mut rdr = ReaderBuilder::new().has_headers(false).from_reader(File::open("data.csv").unwrap());
for result in rdr.records() {
let record = result.unwrap();
db.set(record.get(0).unwrap().as_bytes(), record.get(1).unwrap().as_bytes()).unwrap();
}
ShorterDB uses the thiserror
crate for error handling. Custom error types are defined in errors.rs
.
#[derive(Error, Debug)]
pub enum ShortDBErrors {
#[error("{0}")]
Io(#[from] io::Error),
#[error("Key not found")]
KeyNotFound,
#[error("Value not set")]
ValueNotSet,
#[error("Flush needed from Memtable")]
FlushNeededFromMemTable,
}
The ShorterDB
struct ties together the Memtable and SST components.
pub struct ShorterDB {
pub(crate) memtable: Memtable,
pub(crate) sst: SST,
pub(crate) data_dir: PathBuf,
}
impl ShorterDB {
pub fn set(&mut self, key: &[u8], value: &[u8]) -> Result<()> {
self.memtable.set(key, value)?;
Ok(())
}
}
ShorterDB uses the thiserror
crate for error handling. Custom error types are defined in errors.rs
.
#[derive(Error, Debug)]
pub enum ShortDBErrors {
#[error("{0}")]
Io(#[from] io::Error),
#[error("Key not found")]
KeyNotFound,
#[error("Value not set")]
ValueNotSet,
#[error("Flush needed from Memtable")]
FlushNeededFromMemTable,
}
The ShorterDB
struct ties together the WAL, Memtable, and SST components.
pub struct ShorterDB {
pub(crate) memtable: Memtable,
pub(crate) wal: WAL,
pub(crate) sst: SST,
pub(crate) data_dir: PathBuf,
}
impl ShorterDB {
pub fn set(&mut self, key: &[u8], value: &[u8]) -> Result<()> {
let entry = WALEntry {
key: Bytes::copy_from_slice(key),
value: Bytes::copy_from_slice(value),
};
self.wal.write(&entry)?;
self.memtable.set(key, value)?;
Ok(())
}
}
The embedded
example demonstrates how to use ShorterDB as an embedded database.
let mut db = ShorterDB::new(Path::new("./embedded_db")).unwrap();
db.set(b"hello", b"world").unwrap();
let value = db.get(b"hello").unwrap();
assert_eq!(value, Some(b"world".to_vec()));
The grpc
example provides a gRPC interface for remote database access.
#[tonic::async_trait]
impl Basic for DbOperations {
async fn get(&self, request: tonic::Request<GetRequest>) -> Result<tonic::Response<GetResponse>, tonic::Status> {
let key = request.get_ref().key.clone();
let db = self.db.lock().await;
match db.get(key.as_bytes()) {
Ok(Some(value)) => Ok(tonic::Response::new(GetResponse { value: String::from_utf8(value).unwrap() })),
Ok(None) => Err(tonic::Status::not_found("Key not found")),
Err(_) => Err(tonic::Status::internal("Error reading from the database")),
}
}
}
The repl_csv
example imports data from a CSV file and provides a REPL interface.
let mut rdr = ReaderBuilder::new().has_headers(false).from_reader(File::open("data.csv").unwrap());
for result in rdr.records() {
let record = result.unwrap();
db.set(record.get(0).unwrap().as_bytes(), record.get(1).unwrap().as_bytes()).unwrap();
}
- Performance is not optimized for production use.
- Limited concurrency support.
- No advanced features like compression or compaction.
- Add support for compression.
- Implement advanced compaction strategies.
- Improve concurrency and parallelism.
ShorterDB is built using a modular architecture that separates concerns into distinct components:
The WAL ensures durability by logging all write operations before they are applied to the in-memory Memtable
. This guarantees that data can be recovered in case of a crash.
pub(crate) struct WAL {
path: PathBuf,
file: File,
}
impl WAL {
pub(crate) fn write(&mut self, entry: &WALEntry) -> io::Result<()> {
self.file.write_all(&entry.key.len().to_le_bytes())?;
self.file.write_all(entry.key.as_ref())?;
self.file.write_all(&entry.value.len().to_le_bytes())?;
self.file.write_all(entry.value.as_ref())?;
self.file.flush()?;
Ok(())
}
}
The Memtable
is an in-memory data structure that stores key-value pairs. It uses a SkipMap
for efficient lookups and maintains a size limit to trigger flushing to SSTs.
pub(crate) struct Memtable {
pub(crate) memtable: Arc<SkipMap<Bytes, Bytes>>,
pub(crate) size: u64,
}
impl Memtable {
pub(crate) fn set(&mut self, key: &[u8], value: &[u8]) -> Result<()> {
self.memtable.insert(Bytes::copy_from_slice(key), Bytes::copy_from_slice(value));
self.size += 1;
if self.size >= 256 {
return Err(ShortDBErrors::FlushNeededFromMemTable);
}
Ok(())
}
}
The SST is a persistent, sorted, and immutable data structure stored on disk. It is used for long-term storage of key-value pairs.
pub(crate) struct SST {
pub(crate) dir: PathBuf,
pub(crate) levels: Vec<PathBuf>,
pub(crate) queue: VecDeque<Memtable>,
}
impl SST {
pub(crate) fn set(&mut self) {
let mem = self.queue.pop_front().unwrap();
for entry in mem.memtable.iter() {
let key = entry.key();
let value = entry.value();
let mut path_of_kv_file = self.dir.clone();
path_of_kv_file.push("l0");
path_of_kv_file.push(bytes_to_string(key));
let mut file = File::create_new(&path_of_kv_file);
file.unwrap().write_all(value).unwrap();
}
}
}
- Performance is not optimized for production use.
- Limited concurrency support.
- No advanced features like compression or compaction.
- Add support for compression.
- Implement advanced compaction strategies.
- Improve concurrency and parallelism.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Submit a pull request with a clear description of your changes.
ShorterDB is a simple and modular key-value store designed for learning and experimentation. While it may not match the performance of production-grade systems, it provides a clear and extensible implementation of database concepts. Explore the examples to get started!