What is ccask? An ultra‑fast, append‑only, crash‑safe KV‑engine in C, inspired by Riak's Bitcask and designed to be embedded in high‑performance systems.
ccask is a pure‑C re‑implementation of the Bitcask log‑structured key/value store. It provides a fast, append‑only on‑disk storage engine with an in‑memory hash index (the "keydir"), hint files for fast recovery, and support for threading and CRC‑protected records.
Inspired by Bitcask: A Log‑Structured Hash Table for Fast Key/Value Data
-
🗒️ Append‑Only Storage
Sequential write pattern for maximum throughput on modern SSDs and HDDs. -
⚡ High-Performance Write Path
All writes are append-only and non-blocking, using memory-mapped ring buffers. -
🧠 In‑Memory Key Directory
Fast O(1) lookups via authash
‑based hash table that maps each key to its on‑disk location. -
💾 Hint Files for Fast Recovery
Per‑segment hint files dramatically reduce startup times by avoiding full log scans. -
🔒 Data Integrity with CRC32
Every record includes a CRC32 checksum, verified on read to detect on‑disk corruption. -
🔐 Thread‑Safe I/O
Fine‑grained POSIX locks plus a file‑descriptor invalidator ensure safe concurrent access. -
🔧 Minimal C Library API
Simple, well‑typed functions forinit
,shutdown
,put
,get
,delete
, and key iteration—easy to embed in any C project.
- C11 compiler (GCC 7+, Clang 6+, or equivalent)
- C++17 compiler (for internal logging components)
- CMake 3.15+
- Dependencies:
zlib
(required)spdlog
(optional, for logging - can be disabled)pthreads
(usually available on POSIX systems)
Best for embedding ccask directly in your project.
Option A: As a Git Submodule
# In your project root
git submodule add https://github.com/ShardulNalegave/ccask.git third_party/ccask
git submodule update --init --recursive
Option B: Clone directly
# In your project's third_party directory
git clone https://github.com/ShardulNalegave/ccask.git
Then in your CMakeLists.txt
:
# Add ccask subdirectory
add_subdirectory(third_party/ccask)
# Link to your target
target_link_libraries(your_app PRIVATE ccask)
Build your project:
mkdir build && cd build
cmake .. -DCCASK_ENABLE_LOGGING=ON
cmake --build .
Best for system-wide installation and multiple projects using ccask.
Step 1: Install ccask
git clone https://github.com/ShardulNalegave/ccask.git
cd ccask
mkdir build && cd build
# Configure
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DCCASK_ENABLE_LOGGING=ON \
-DCCASK_BUILD_SHARED=OFF
# Build and install
make -j$(nproc)
sudo make install
Step 2: Use in your project
# In your CMakeLists.txt
find_package(ccask REQUIRED)
target_link_libraries(your_app PRIVATE ccask::ccask)
Step 3: Build your project
mkdir build && cd build
cmake ..
cmake --build .
For complete control over the build process.
Step 1: Build ccask
git clone https://github.com/ShardulNalegave/ccask.git
cd ccask
mkdir build && cd build
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DCCASK_ENABLE_LOGGING=ON
make -j$(nproc)
This produces libccask.a
in the build/
directory.
Step 2: Compile your application
gcc -std=c11 \
-I/path/to/ccask/include \
-L/path/to/ccask/build \
your_app.c \
-lccask -lz -lpthread -lspdlog -lstdc++ \
-o your_app
Note: If you built with -DCCASK_ENABLE_LOGGING=OFF
, omit -lstdc++
and -lspdlog
.
ccask can be used as an embedded key-value database in any other application. The public API is accessible through the ccask.h
header file.
#include "ccask/core.h"
int main() {
ccask_options_t opts;
opts.data_dir = "<path-to-data-directory>";
opts.writer_ringbuf_capacity = 100;
opts.datafile_rotate_threshold = 100;
ccask_init(opts);
/* You can run get, put, delete, etc operations here */
ccask_shutdown();
return 0;
}
ccask
provides the following configuration options when calling ccask_init
:-
data_dir
: Directory where all datafiles are storedwriter_ringbuf_capacity
: Capacity of the Writer Ring-Bufferdatafile_rotate_threshold
: Size after which datafiles must be rotated (Note: This doesn't have any effect on existing datafiles)
ccask
is organized into discrete modules, each responsible for a clear portion of functionality:
-
core
Exposes the public C API (init
,shutdown
,put
,get
,delete
,iterator
) and orchestrates startup, shutdown, and thread lifecycles. -
files
Manages on‑disk datafiles and hintfiles: scanning the directory, opening/closing FDs, file rotation, and low‑level I/O primitives. -
keydir
Maintains the in‑memory hash table. Handles recovery from hintfiles and datafiles during bootup. -
reader
Implements synchronous read operations (get
, iteration) by consulting the keydir, issuingpreadv
calls, and spawning per‑file FD invalidator threads to close idle descriptors. -
writer
Runs in its own thread: pulls pre‑serialized records from the writer_ringbuf, appends them viawritev
to the active datafile, triggers rotation when the size threshold is reached, and invokes hintfile generation for closed segments. -
writer_ringbuf
A fixed‑capacity, lock‑protected MPSC queue ofstruct iovec[3]
records, decoupling clientput()
calls from disk writes for high throughput. -
hint
When the writer rotates a datafile, this module spawns a dedicated thread to scan the closed file and emit a compact<id>.hint
file, used for fast keydir rebuilding on restart.
flowchart LR
subgraph Core
CORE["ccask_core<br/>(API & orchestration)"]
end
subgraph Storage
FILES["ccask_files<br/>(datafile & hintfile I/O)"]
KEYDIR["ccask_keydir<br/>(in‑memory index)"]
end
subgraph I/O
RBUF["writer_ringbuf<br/>(MPSC iovec queue)"]
READER["ccask_reader<br/>(read interface)"]
WRITER["ccask_writer<br/>(write thread)"]
HINT["ccask_hint<br/>(hintfile generator)"]
end
CORE --> FILES
CORE --> KEYDIR
CORE --> READER
CORE --> RBUF
RBUF --> WRITER
WRITER --> FILES
WRITER --> HINT
READER --> KEYDIR
READER --> FILES
On ccask_init(options)
:
- files scans provided data-path, builds the file linked-list + hash-table, detects .data and .hint pairs, opens or creates the active segment.
- keydir reads every
<id>.hint
to populate its hash table, then scans the active .data file for any missing entries. - writer thread is spawned, waiting on the ring buffer.
On ccask_shutdown()
:
- Signal writer to flush and join.
- Close all open FDs in files.
- Free the keydir hash table.
- Destroy the ring buffer and any remaining threads.
ccask
provides both non-blocking (put
, delete
) blocking (put_blocking
, delete_blocking
) variants for write operations.
While the blocking calls are quite straightforward, the non-blocking variants use a writer_ringbuf
to enqueue records which are then written by a dedicated writer
thread. This allows for higher throughput.
- Caller invokes
ccask_put(key, key_size, value, value_size)
. core
serializes a datafile record into a struct iovec[3].- Enqueues the iovec array into the
writer_ringbuf
. writer
thread wakes, pops the record, and writev‑appends to the active .data file.- If the file size exceeds the threshold, files rotates the segment:
- Close old segment, rename it to
<id>.data
. - Create a fresh active
<id+1>.active
. - Signal
writer
to continue on the new file.
- Close old segment, rename it to
writer
invokes the Hintfile generator to spawn a thread that creates<id>.hint
for the closed segment.
flowchart LR
CLIENT["ccask_put()"]
CLIENT --> CORE["serialize -> iovec[3]"]
CORE --> RBUF["enqueue into ringbuf"]
RBUF -- "notified of records in buffer" --> WRITER["Writer Thread"]
WRITER --> ROT["if size > max: rotate"]
ROT --> HT["spawn hintfile thread"]
ROT --> WRITER
WRITER --> FILES["writev to active .data"]
HT --> HT_DONE["create <id>.hint"]
- Caller invokes
ccask_get(key, key_size, &out_record)
. reader
module looks up the latest key-directory record for provided key.- Issues a
preadv
on the correct datafile viafiles
to read header, key, and value in one system call. - Spawns an FD invalidator thread for that file (if not already running) to close the descriptor after idle timeout.
- Verifies CRC32, returns the value and metadata to the caller.
Note:- Read calls (get) are blocking calls, they will block the caller thread till the value is read.
flowchart LR
CLIENT["ccask_get()"]
CLIENT --> KEYDIR["lookup keydir"]
KEYDIR --> FILES["preadv"]
FILES --> IT["spawn invalidator(fd) if not already running"]
IT --> VERIFY["CRC32 check"]
VERIFY --> RETURN["return stored K/V record with metadata"]
This project is licensed under the GNU Lesser General Public License v3.0 (LGPL-3.0).
You may use, distribute, and modify ccask
under the terms of the LGPL as published by the Free Software Foundation.
For more information, see the COPYING.LESSER and COPYING files.