A Python package providing two implementations of a time-based storage system for managing events with timestamps. This library is useful for applications that need to efficiently store and query time-based data, such as event logs, time series data, monitoring systems, and schedulers.
- Three storage implementations:
TimeBasedStorage
: Uses a dictionary for simple key-value accessTimeBasedStorageHeap
: Uses a heap for efficient insertion and earliest event accessTimeBasedStorageRBTree
: Uses a Red-Black Tree for balanced performance (O(log n) insertions and efficient range queries)
- Thread-safe variants:
ThreadSafeTimeBasedStorage
: Thread-safe version of TimeBasedStorageThreadSafeTimeBasedStorageHeap
: Thread-safe version of TimeBasedStorageHeapThreadSafeTimeBasedStorageRBTree
: Thread-safe version of TimeBasedStorageRBTree
- Support for:
- Event creation and deletion
- Range queries
- Duration-based queries
- Day-of-week queries
- Earliest/latest event access
- Timestamp collision handling
- Generic typing (store any data type)
Comprehensive documentation is available to help you get the most out of this library:
- Architecture Guide - Design principles, implementation details, and performance considerations
- Code Examples - Practical usage examples and patterns
- Concurrent Use Cases - Real-world scenarios for concurrent access
- Alternative Approaches - Limitations of current implementation and alternative storage strategies
The rest of this README provides an overview of installation, basic usage, and API reference.
pip install time_based_storage
from datetime import datetime, timedelta
from time_based_storage import TimeBasedStorage, TimeBasedStorageHeap
# Create storage instances
storage = TimeBasedStorage[str]() # Type annotation for stored values
heap_storage = TimeBasedStorageHeap[str]()
# Add events with timestamps
storage.add(datetime(2024, 1, 1, 10, 0), "Event 1")
storage.add(datetime(2024, 1, 1, 11, 0), "Event 2")
storage.add(datetime(2024, 1, 1, 12, 0), "Event 3")
# Query events in a time range
start_time = datetime(2024, 1, 1, 10, 30)
end_time = datetime(2024, 1, 1, 11, 30)
events_in_range = storage.get_range(start_time, end_time) # Returns ["Event 2"]
# Query events within a duration (last hour)
duration = 3600 # seconds
recent_events = storage.get_duration(duration)
# Get all events and timestamps
all_events = storage.get_all()
all_timestamps = storage.get_timestamps()
# Remove an event
storage.remove(datetime(2024, 1, 1, 11, 0)) # Removes "Event 2"
# Clear all events
storage.clear()
For multithreaded applications, use the thread-safe variants:
from time_based_storage import ThreadSafeTimeBasedStorage, ThreadSafeTimeBasedStorageHeap
import threading
# Create thread-safe storage
storage = ThreadSafeTimeBasedStorage[int]()
# Use in multiple threads
def producer():
for i in range(10):
storage.add(datetime.now(), i)
def consumer():
# Wait for data with timeout
if storage.wait_for_data(timeout=1.0):
data = storage.get_all()
print(f"Received: {data}")
# Start threads
producer_thread = threading.Thread(target=producer)
consumer_thread = threading.Thread(target=consumer)
producer_thread.start()
consumer_thread.start()
- Best for: Applications with small to medium datasets and simple access patterns
- Advantages: Efficient range queries, direct index access, simple implementation
- Trade-offs: Slower insertion (O(n)) especially with sorted data
- Best for: Applications needing fast insertion or frequent access to earliest events
- Advantages: Fast insertion (O(log n)), efficient earliest event access (O(1))
- Trade-offs: Less efficient for range queries (O(n log n))
- Best for: Applications requiring balanced performance across operations, especially range queries
- Advantages: Fast insertion (O(log n)), highly efficient range queries (O(log n + k)), maintains performance with sorted data
- Trade-offs: Slightly higher memory overhead, dependency on sortedcontainers package
- Benchmark highlights: Up to 470x faster for small precise range queries, 114x average speedup for range operations
Method | Description | Time Complexity (Standard/Heap/RBTree) |
---|---|---|
add(timestamp, value) |
Add a value at a specific timestamp | O(n) / O(log n) / O(log n) |
get_value_at(timestamp) |
Get value at a specific timestamp | O(1) / O(n) / O(1) |
get_range(start, end) |
Get values in a time range | O(n) / O(n log n) / O(log n + k) |
get_duration(seconds) |
Get values within a duration | O(n) / O(n log n) / O(log n + k) |
remove(timestamp) |
Remove value at a timestamp | O(n) / O(log n) / O(log n) |
clear() |
Remove all values | O(1) / O(1) / O(1) |
size() |
Get number of stored events | O(1) / O(1) / O(1) |
is_empty() |
Check if storage is empty | O(1) / O(1) / O(1) |
get_all() |
Get all stored values | O(1) / O(1) / O(1) |
get_timestamps() |
Get all timestamps | O(1) / O(1) / O(1) |
add_unique_timestamp() |
Add with timestamp collision handling | Varies |
Method | Description | Notes |
---|---|---|
wait_for_data(timeout) |
Wait for data to be available | Blocks until data or timeout |
notify_data_available() |
Notify waiting threads | Called automatically on add |
- Insertion: O(n)
- Range Queries: O(n)
- Duration Queries: O(n)
- Earliest/Latest: O(n)
- Memory Usage: Lower overhead per element
- Insertion: O(log n)
- Range Queries: O(n log n)
- Duration Queries: O(n log n)
- Earliest Event: O(1)
- Latest Event: O(n log n)
- Memory Usage: Moderate overhead
- Insertion: O(log n)
- Range Queries: O(log n + k) where k is the number of items in range
- Duration Queries: O(log n + k)
- Earliest Event: O(log n)
- Latest Event: O(log n)
- Memory Usage: Slightly higher overhead
Benchmark Results (500,000 entries):
- Range query performance: ~114x average speedup over standard implementation
- Small precise range queries (0.01% of data): ~470x faster
- Small range queries (0.1% of data): ~87x faster
- Medium range queries (1% of data): ~12x faster
- Most beneficial for targeted range queries on large datasets
This library is well-suited for:
- Event logging and analysis
- Time series data storage
- Monitoring systems
- Event scheduling
- Message queues with time-based priorities
- Session tracking
For more detailed information about using this library in various scenarios, see:
- Architecture Guide - Learn about the design principles and implementation details
- Code Examples - See practical examples of how to use the library
- Concurrent Use Cases - Explore real-world concurrent access scenarios
- Alternative Approaches - Understand limitations and alternative storage strategies for larger datasets
Run the complete test suite:
# From the project root
cd time_based_storage
python -m pytest tests/ -v
Run tests with code coverage:
cd time_based_storage
python -m pytest tests/ -v --cov=src/time_based_storage --cov-report=html
This will generate an HTML coverage report in the htmlcov
directory.
git clone https://github.com/johnburbridge/python-time-based-experiment.git
cd python-time-based-experiment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e time_based_storage/
pip install -r time_based_storage/requirements-dev.txt
This project uses:
- Black for code formatting
- Flake8 for linting
Apply formatting:
black time_based_storage/src time_based_storage/tests
Check style:
flake8 time_based_storage/src time_based_storage/tests
This project is licensed under the MIT License - see the LICENSE file for details.