An LSM storage engine generally contains three parts:
- Write-ahead log to persist temporary data for recovery.
- SSTs on the disk to maintain an LSM-tree structure.
- Mem-tables in memory for batching small writes.
The storage engine generally provides the following interfaces:
- Put(key, value): store a key-value pair in the LSM tree.
- Delete(key): remove a key and its corresponding value.
- Get(key): get the value corresponding to a key.
- Scan(range): get a range of key-value pairs.
To ensure persistence:
- Sync(): ensure all the operations before sync are persisted to the disk.
Types of compaction algorithms:
- Leveled compaction
The write path of LSM contains four steps:
- Write the key-value pair to the write-ahead log so that it can be recovered after the storage engine crashes.
- Write the key-value pair to memtable. After (1) and (2) are completed, we can notify the user that the write operation is completed.
- (In the background) When a mem-table is full, we will freeze them into immutable mem-tables and flush them to the disk as SST files in the background.
- (In the background) The engine will compact some files in some levels into lower levels to maintain a good shape for the LSM tree so that the read amplification is low.
Read path:
- Find key in memtables(new to old)
- We will first probe all the mem-tables from the latest to the oldest.
- If the key is not found, we will then search the entire LSM tree containing SSTs to find the data.
There are two types of read: lookup and scan. Lookup finds one key in the LSM tree, while scan iterates all keys within a range in the storage engine. We will cover both of them throughout the course.