Simple Redis Implementation in C/C++

How to build and run the project

Build

At root level of the project, run:
1. cmake . -B build
2. cmake --build build

Run

Then in one terminal window/session, run the server ./build/src/server
Open a new terminal window/session, run the client with arguments: ./build/src/client <args>
- one example is to run the Python test script itself: ./src/test_commands.py

Notes

Sockets

server-side vs. client-side
Two kinds of mainstream sockets:
- DARPA Internet Socket
- Unix Socket

Internet Socket

Two mainstream types:
- Stream Sockets - SOCK_STREAM
  - reliable two-way connected communication streams
  - uses TCP
- Datagram Sockets - SOCK_DGRAM
  - a.k.a. connectionless socket
  - uses UDP

Server-Side

socket() - syscall that returns an fd
bind() - associates an address to a socket fd
listen() - enables user to accept connections to that address
accept() - takes a listening fd
- when a client makes a connection to the listening address,
- accept() returns an fd that represents the connection socket
read() - receives data from a TCP connection
write() - sends data
close() - destroys the resource referred by the fd and recycles fd
send() and recv() might offer better control over data transmission than read() and write()

Client-Side

connect()
- takes a socket fd and address
- makes a TCP connection to that address

Protocol Parsing

Used to spilt requests apart from the TCP byte stream
Current scheme:

+-----+-----+-----+-----+----------
| len | msg1 | len | msg2 | more...
+-----+-----+-----+-----+----------

Two parts:
- 4-byte little-endian integer - length of the request
- variable-length request

Protocol Desgin

Text vs. Binary

Text
- human-readable
- HTTP
- hard to parse - variable-length strings
Avoid unnecessary variable-length components
protocol parsing currently uses 2 read() syscalls
- could use a buffered I/O

Event Loop and Non-blocking I/O

Three ways to handle concurrent connections in server-side network programming
- forking
  - creates new processes for each client connection
- multi-threading
  - uses threads instead of processes
- event loops
  - polling
  - non-blocking I/O
Use poll to determine which fd can be operated without blocking
when an I/O operation is done on an fd, it should be non-blocking
In blocking mode,
- read blocks the caller when there are no data in the kernel
- write blocks when the write buffer is full
- accept blocks when there are no new connections in the kernel queue
In non-blocking mode
- operations either success without blocking, or
- fail with errno EAGAIN - not ready
- non-blocking operations that fail with EAGAIN must be retried after the readiness was notified by poll
poll is the SOLE blocking operation in an event loop
All blocking networking IO APIs (read, write, accept) have a nonblocking mode
APIs that do not have a non-blocking mode (gethostbyname, disk IOs) should be performed in thread pools
Timers should also implemented within the event loop since we cannot sleep inside
Also select and epoll on Linux
- epoll consists of 3 syscalls:
  - epoll_create
  - epoll_wait
  - epoll_ctl
- stateful API
- used to manipulate an fd set create by epoll_create, which is operated upon by epoll_wait

Implementation

Need buffers for reading/writing
- in nonblocking mode, IO operations are often deferred
poll()
- poll() is actually horribly slow for large number of connections
- Use a dedicated event lib such as libevent instead
- ask the OS to let us know when some data is ready to read on which sockets
Plan of using poll():
- Keep an array of struct pollfds with info about:
  - which socket descriptors should be monitored
  - what kind of events we want to monitor for
- OS will block on the poll() call until one of the events occurs or user-specified timeout occurs
For a request/response protocol, clients are not limited to sending one request and waiting for the response at a time
- could save some latency
- **pipelining**

`get`, `set`, `del`

command: list of strings, such as set key val
Scheme of the command:
- nstr - number of strings - 4 bytes
- len - length of the following string - 4 bytes

+------+-----+------+-----+------+-----+-----+------+
| nstr | len | str1 | len | str2 | ... | len | strn |
+------+-----+------+-----+------+-----+-----+------+

Scheme of the response:
- res - 4-byte status code
- data - response string

+-----+---------
| res | data...
+-----+---------

Data Structure: Hashtables

Tow kinds of hashtables:
- chaining
- open addressing
Main difference: collision resolution
- open addressing: seeks another free slot in the event of a collision
- chaining: groups conflicting keys with a linked list

Progressive Resizing

When needing more space for the hashtable, we resize
To avoid stalling the server, keep two hashtables and gradually move nodes between them

Data Serialization

The **Type-Length-Value (TLV)** scheme

The AVL Tree

Redis supports querying sorted data by rank
Redis actually uses skiplist for storing sorted data
Resources:

The AVL Tree and the Sorted Set

Rank-Based Queries
- the primary use case of sorted sets
- range query - just a regular binary tree look-up, followed by an offset operation
- however, the offset operation is not a regular binary tree walk
Sorted Set
- a Redis data type
- https://redis.io/docs/data-types/sorted-sets/

Event Loop and Timers

Every networked application needs to handle timeouts
we need timers
- timeout value of poll should be the timeout value of the nearest timer (why..?)
Add timers to kick out idle TCP connections
For each connection there is a timer, set to a fixed timeout into the future
- every time I/O activities occur on the connection, timer is renewed to a fixed timeout
When a timer is renewed, it becomes the most distant one

Heap Data Structure and TTL

TTL - one way to manage the size of cache
Heap data structure
- binary tree, packed in to array
- parent is no bigger than children

Thread Pool & Async Tasks

thread from the thread pool consumes tasks form a queue and executes them
pthread API
- pthread_mutex_t
- pthread_cond_t
  - wakes up consumer threads only when the queue is not empty
  - consumer threads should be sleeping when idle

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Simple Redis Implementation in C/C++

How to build and run the project

Build

Run

Notes

Sockets

Internet Socket

Server-Side

Client-Side

Protocol Parsing

Protocol Desgin

Text vs. Binary

Event Loop and Non-blocking I/O

Implementation

`get`, `set`, `del`

Data Structure: Hashtables

Progressive Resizing

Data Serialization

The AVL Tree

The AVL Tree and the Sorted Set

Event Loop and Timers

Heap Data Structure and TTL

Thread Pool & Async Tasks

About

Uh oh!

Releases

Packages

Languages

rollschild/redis-c

Folders and files

Latest commit

History

Repository files navigation

Simple Redis Implementation in C/C++

How to build and run the project

Build

Run

Notes

Sockets

Internet Socket

Server-Side

Client-Side

Protocol Parsing

Protocol Desgin

Text vs. Binary

Event Loop and Non-blocking I/O

Implementation

get, set, del

Data Structure: Hashtables

Progressive Resizing

Data Serialization

The AVL Tree

The AVL Tree and the Sorted Set

Event Loop and Timers

Heap Data Structure and TTL

Thread Pool & Async Tasks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`get`, `set`, `del`

Packages