A lightweight, educational implementation of an LSM (Log-Structured Merge-Tree) storage engine in Python. This project is built for learning purposes to understand how LSM trees work in real-world databases like RocksDB, LevelDB, and others.
- Simple key-value store with PUT, GET, DELETE operations
- In-memory memtable with configurable capacity
- Automatic flushing of memtable to disk as SSTable when full
- SSTable compaction to handle disk space efficiency
- REPL interface for interactive usage
This implementation follows the standard LSM tree architecture:
- Writes go to an in-memory memtable first
- When the memtable reaches capacity, it's sorted and flushed to disk as an SSTable
- Read operations check the memtable first, then scan through SSTables (newest to oldest)
- Periodic compaction merges SSTables to reclaim space and improve read performance
# Install dependencies
pip install -r requirements.txt
# Run the REPL interface
python main.py
In the REPL interface:
PUT key%value
- Store a key-value pairGET key
- Retrieve a value by keyDELETE key
- Delete a key-value pairhelp
- Display available commandsquit
- Exit the program
Example:
> PUT fruit%apple
Added fruit, apple successfully
> GET fruit
GET fruit -> apple
main.py
- REPL interface and command parsingtiny_lsm/
- Core implementationengine.py
- Main storage engine implementationmemtable.py
- In-memory table implementationtests/
- Test suite
This is a toy implementation meant for educational purposes:
- No real concurrency controls
- Simple file format for SSTables
- Linear search through SSTables (no indexing or bloom filters)
- Basic compaction strategy
- No recovery mechanism
pytest