Python bindings for libCacheSim, a high-performance cache simulator and analysis library.
- English Documentation - Complete API reference, tutorials, and examples
- 中文文档 - 完整的API参考、教程和示例
Binary installers for the latest released version are available at the Python Package Index (PyPI).
pip install libcachesim
If there are no wheels suitable for your environment, consider building from source.
bash scripts/install.sh
Run all tests to ensure the package works.
python -m pytest tests/
- High-Performance Cache Simulation: Built on the proven libCacheSim C++ library
- Multiple Cache Algorithms: LRU, LFU, FIFO, ARC, S3FIFO, Sieve, TinyLFU, and more
- Trace Processing: Support for various trace formats (CSV, binary, Oracle, etc.)
- Synthetic Workload Generation: Zipf, uniform, and custom distributions
- Trace Analysis: Comprehensive workload analysis and visualization tools
- Custom Cache Policies: Implement new algorithms using Python hooks
- Multi-language Documentation: English and Chinese documentation with examples
import libcachesim as lcs
# Create a cache
cache = lcs.LRU(cache_size=1024*1024) # 1MB cache
# Process requests
req = lcs.Request()
req.obj_id = 1
req.obj_size = 100
print(cache.get(req)) # False (first access)
print(cache.get(req)) # True (second access)
To simulate with traces, we need to read the request of traces correctly. open_trace
is an unified interface for trace reading, which accepet three parameters:
trace_path
: trace path, can be relative or absolutive path.type
(optional): if not given, we will automatically infer the type of trace according to the suffix of the trace file.params
(optional): if not given, default params are applied.
import libcachesim as lcs
# Open trace and process efficiently
reader = lcs.open_trace(
trace_path = "./data/cloudPhysicsIO.oracleGeneral.bin",
type = lcs.TraceType.ORACLE_GENERAL_TRACE,
params = lcs.ReaderInitParam(ignore_obj_size=True)
)
cache = lcs.S3FIFO(cache_size=1024*1024)
# Process entire trace efficiently (C++ backend)
obj_miss_ratio, byte_miss_ratio = cache.process_trace(reader)
print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}")
cache = lcs.S3FIFO(cache_size=1024*1024)
# Process with limits and time ranges
obj_miss_ratio, byte_miss_ratio = cache.process_trace(
reader,
start_req=0,
max_req=1000
)
print(f"Object miss ratio: {obj_miss_ratio:.4f}, Byte miss ratio: {byte_miss_ratio:.4f}")
Implement custom cache replacement algorithms using pure Python functions - no C/C++ compilation required.
The PluginCache
allows you to define custom caching behavior through Python callback functions. This is perfect for:
- Prototyping new cache algorithms
- Educational purposes and learning
- Research and experimentation
- Custom business logic implementation
You need to implement these callback functions:
init_hook(cache_size: int) -> Any
: Initialize your data structurehit_hook(data: Any, obj_id: int, obj_size: int) -> None
: Handle cache hitsmiss_hook(data: Any, obj_id: int, obj_size: int) -> None
: Handle cache misseseviction_hook(data: Any, obj_id: int, obj_size: int) -> int
: Return object ID to evictremove_hook(data: Any, obj_id: int) -> None
: Clean up when object removedfree_hook(data: Any) -> None
: [Optional] Final cleanup
import libcachesim as lcs
from collections import OrderedDict
# Create a Python hook-based cache
cache = lcs.PluginCache(cache_size=1024*1024, cache_name="MyLRU")
# Define LRU policy hooks
def init_hook(cache_size):
return OrderedDict() # Track access order
def hit_hook(lru_dict, obj_id, obj_size):
lru_dict.move_to_end(obj_id) # Move to most recent
def miss_hook(lru_dict, obj_id, obj_size):
lru_dict[obj_id] = True # Add to end
def eviction_hook(lru_dict, obj_id, obj_size):
return next(iter(lru_dict)) # Return least recent
def remove_hook(lru_dict, obj_id):
lru_dict.pop(obj_id, None)
# Set the hooks
cache.set_hooks(init_hook, hit_hook, miss_hook, eviction_hook, remove_hook)
# Use it like any other cache
req = lcs.Request()
req.obj_id = 1
req.obj_size = 100
hit = cache.get(req)
print(f"Cache hit: {hit}") # Should be False (miss)
import libcachesim as lcs
from collections import deque
from contextlib import suppress
cache = lcs.PluginCache(cache_size=1024, cache_name="CustomFIFO")
def init_hook(cache_size):
return deque() # Use deque for FIFO order
def hit_hook(fifo_queue, obj_id, obj_size):
pass # FIFO doesn't reorder on hit
def miss_hook(fifo_queue, obj_id, obj_size):
fifo_queue.append(obj_id) # Add to end of queue
def eviction_hook(fifo_queue, obj_id, obj_size):
return fifo_queue[0] # Return first item (oldest)
def remove_hook(fifo_queue, obj_id):
with suppress(ValueError):
fifo_queue.remove(obj_id)
# Set the hooks and test
cache.set_hooks(init_hook, hit_hook, miss_hook, eviction_hook, remove_hook)
req = lcs.Request(obj_id=1, obj_size=100)
hit = cache.get(req)
print(f"Cache hit: {hit}") # Should be False (miss)
- FIFO: First-In-First-Out
- LRU: Least Recently Used
- LFU: Least Frequently Used
- LFUDA: LFU with Dynamic Aging
- Clock: Clock/Second-chance algorithm
- QDLP: Queue Demotion with Lazy Promotion
- S3FIFO: Simple, Fast, Fair FIFO (recommended for most workloads)
- Sieve: High-performance eviction algorithm
- ARC: Adaptive Replacement Cache
- TwoQ: Two-Queue algorithm
- SLRU: Segmented LRU
- TinyLFU: TinyLFU with window
- WTinyLFU: Windowed TinyLFU
- LeCaR: Learning Cache Replacement (adaptive)
- Cacheus: Cache replacement policy
- LRB: Learning-based cache (if enabled)
- GLCache: Machine learning-based cache
- ThreeLCache: Three-level cache hierarchy (if enabled)
- Belady: Optimal offline algorithm
- BeladySize: Size-aware optimal algorithm
import libcachesim as lcs
# All algorithms use the same unified interface
cache_size = 1024 * 1024 # 1MB
lru_cache = lcs.LRU(cache_size)
s3fifo_cache = lcs.S3FIFO(cache_size)
sieve_cache = lcs.Sieve(cache_size)
arc_cache = lcs.ARC(cache_size)
# All caches work identically
req = lcs.Request()
req.obj_id = 1
req.obj_size = 100
hit = lru_cache.get(req)
print(hit)
import libcachesim as lcs
def compare_algorithms(trace_path):
reader = lcs.open_trace(trace_path, lcs.TraceType.VSCSI_TRACE)
algorithms = ['LRU', 'S3FIFO', 'Sieve', 'ARC']
for algo_name in algorithms:
cache = getattr(lcs, algo_name)(cache_size=1024*1024)
obj_miss_ratio, byte_miss_ratio = cache.process_trace(reader)
print(f"{algo_name}\t\tObj: {obj_miss_ratio:.4f}, Byte: {byte_miss_ratio:.4f}")
compare_algorithms("./data/cloudPhysicsIO.vscsi")
import time
def benchmark_cache(cache, num_requests=100000):
"""Benchmark cache performance"""
start_time = time.time()
for i in range(num_requests):
req = lcs.Request()
req.obj_id = i % 1000 # Working set of 1000 objects
req.obj_size = 100
cache.get(req)
end_time = time.time()
throughput = num_requests / (end_time - start_time)
print(f"Processed {num_requests} requests in {end_time - start_time:.2f}s")
print(f"Throughput: {throughput:.0f} requests/sec")
# Compare performance
lru_cache = lcs.LRU(cache_size=1024*1024)
s3fifo_cache = lcs.S3FIFO(cache_size=1024*1024)
print("LRU Performance:")
benchmark_cache(lru_cache)
print("\nS3FIFO Performance:")
benchmark_cache(s3fifo_cache)
import libcachesim as lcs
# Supported trace types
trace_types = {
"oracle": lcs.TraceType.ORACLE_GENERAL_TRACE,
"csv": lcs.TraceType.CSV_TRACE,
"vscsi": lcs.TraceType.VSCSI_TRACE,
"txt": lcs.TraceType.PLAIN_TXT_TRACE
}
# Open different trace formats
oracle_reader = lcs.open_trace("./data/cloudPhysicsIO.oracleGeneral.bin", trace_types["oracle"])
csv_reader = lcs.open_trace("./data/cloudPhysicsIO.txt", trace_types["txt"])
# Process traces with different caches
caches = [
lcs.LRU(cache_size=1024*1024),
lcs.S3FIFO(cache_size=1024*1024),
lcs.Sieve(cache_size=1024*1024)
]
for i, cache in enumerate(caches):
miss_ratio_oracle = cache.process_trace(oracle_reader)[0]
miss_ratio_csv = cache.process_trace(csv_reader)[0]
print(f"Cache {i} miss ratio: {miss_ratio_oracle:.4f}, {miss_ratio_csv:.4f}")
Import Error: Make sure libCacheSim C++ library is built first:
cmake -G Ninja -B build && ninja -C build
Performance Issues: Use process_trace()
for large workloads instead of individual get()
calls for better performance.
Memory Usage: Monitor cache statistics (cache.occupied_byte
) and ensure proper cache size limits for your system.
Custom Cache Issues: Validate your custom implementation against built-in algorithms using the test functions above.
Install with uv: Since automatically building with uv
will fail due to incomplete source code, please force install the binary file via uv pip install libcachesim --only-binary=:all:
.
- Check the main documentation for detailed guides
- Open issues on GitHub
- Review examples in the main repository