Skip to content

feat: mypyc compilation #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ repos:
- id: mixed-line-ending
- id: trailing-whitespace
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: "v0.12.2"
rev: "v0.12.3"
hooks:
- id: ruff
args: ["--fix"]
Expand Down
30 changes: 30 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,18 @@ install: destroy clean ## Install the project, depe
@uv sync --all-extras --dev
@echo "${OK} Installation complete! 🎉"

.PHONY: install-compiled
install-compiled: destroy clean ## Install with mypyc compilation for performance
@echo "${INFO} Starting fresh installation with mypyc compilation..."
@uv python pin 3.12 >/dev/null 2>&1
@uv venv >/dev/null 2>&1
@echo "${INFO} Installing in editable mode with mypyc compilation..."
@HATCH_BUILD_HOOKS_ENABLE=1 uv pip install -e .
@uv sync --all-extras --dev
@echo "${OK} Performance installation complete! 🚀"
@echo "${INFO} Verifying compilation..."
@find sqlspec -name "*.so" | wc -l | xargs -I {} echo "${OK} Compiled {} modules"

.PHONY: destroy
destroy: ## Destroy the virtual environment
@echo "${INFO} Destroying virtual environment... 🗑️"
Expand Down Expand Up @@ -83,6 +95,22 @@ build: ## Build the package
@uv build >/dev/null 2>&1
@echo "${OK} Package build complete"

.PHONY: build-performance
build-performance: ## Build package with mypyc compilation
@echo "${INFO} Building package with mypyc compilation... 📦"
@HATCH_BUILD_HOOKS_ENABLE=1 uv build >/dev/null 2>&1
@echo "${OK} Performance package build complete 🚀"

.PHONY: test-mypyc
test-mypyc: ## Test mypyc compilation on individual modules
@echo "${INFO} Testing mypyc compilation... 🔧"
@uv run mypyc --check-untyped-defs sqlspec/utils/statement_hashing.py
@uv run mypyc --check-untyped-defs sqlspec/utils/text.py
@uv run mypyc --check-untyped-defs sqlspec/utils/sync_tools.py
@uv run mypyc --check-untyped-defs sqlspec/statement/cache.py
@echo "${OK} Mypyc compilation tests passed ✨"


.PHONY: release
release: ## Bump version and create release tag
@echo "${INFO} Preparing for release... 📦"
Expand All @@ -108,6 +136,8 @@ clean: ## Cleanup temporary build a
@find . -name '*~' -exec rm -f {} + >/dev/null 2>&1
@find . -name '__pycache__' -exec rm -rf {} + >/dev/null 2>&1
@find . -name '.ipynb_checkpoints' -exec rm -rf {} + >/dev/null 2>&1
@find . -name '*.so' -exec rm -f {} + >/dev/null 2>&1
@find . -name '*.c' -exec rm -f {} + >/dev/null 2>&1
@echo "${OK} Working directory cleaned"
$(MAKE) docs-clean

Expand Down
217 changes: 217 additions & 0 deletions docs/cache_configuration_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# SQLSpec Cache Configuration Guide

SQLSpec provides a three-tier caching system that delivers up to 90% performance improvements for common query patterns through the single-pass pipeline architecture.

## Three-Tier Cache System

The SQLSpec architecture includes three complementary caching layers:

1. **Base Statement Cache**: Processed SQL objects and pipeline results
2. **Filter Result Cache**: Applied filter transformations and compositions
3. **Optimized Expression Cache**: SQLGlot optimization results with AST sub-expression caching

## Cache Layers

### 1. Global Cache Configuration

Controls the size and behavior of all statement caches globally:

```python
from sqlspec.statement.cache import CacheConfig, update_cache_config

# Configure cache sizes
config = CacheConfig(
sql_cache_size=2000, # Processed SQL statements
fragment_cache_size=10000, # AST fragments (WHERE, JOIN, etc.)
optimized_cache_size=3000, # Optimized expressions
sql_cache_enabled=True, # Enable/disable SQL cache
fragment_cache_enabled=True, # Enable/disable fragment cache
optimized_cache_enabled=True, # Enable/disable optimization cache
)
update_cache_config(config)

# Disable specific caches by setting size to 0
minimal_config = CacheConfig(
sql_cache_size=0, # Disables SQL cache
fragment_cache_size=5000, # Fragment cache still active
optimized_cache_size=0, # Disables optimization cache
)
update_cache_config(minimal_config)
```

### 2. Driver-Level Configuration

Set default caching behavior for all statements executed by a driver:

```python
from sqlspec.adapters.sqlite import SqliteConfig
from sqlspec.statement.sql import SQLConfig

# Configure driver with custom statement defaults
config = SqliteConfig(
connection_config={"database": "mydb.db"},
statement_config=SQLConfig(
enable_caching=True, # Enable statement caching
enable_parsing=True, # Enable SQL parsing
enable_validation=True, # Enable validation
),
adapter_cache_size=1000, # Driver's compiled SQL cache
)

# Create driver with configuration
with config.provide_session() as driver:
# All statements use the driver's default config
result = driver.execute("SELECT * FROM users")
```

### 3. Statement-Level Override

Override caching for specific statements:

```python
from sqlspec.statement.sql import SQL, SQLConfig

# Method 1: Create SQL object with custom config
sql = SQL(
"SELECT * FROM users WHERE id = ?",
config=SQLConfig(enable_caching=False) # Disable caching for this statement
)
result = driver.execute(sql, (123,))

# Method 2: Override at execution time
result = driver.execute(
"SELECT * FROM products",
_config=SQLConfig(enable_caching=False) # Override driver's default
)

# Method 3: Using query builders
from sqlspec.statement.builder import Select

query = Select("id", "name").from_("users").where("active = ?", True)
result = driver.execute(
query,
_config=SQLConfig(enable_caching=True) # Force caching even if driver default is False
)
```

## Caching Behavior

### What Gets Cached?

1. **SQL Cache**: Stores fully processed SQL statements with their parameters
2. **Fragment Cache**: Stores parsed AST fragments (WHERE clauses, JOINs, subqueries)
3. **Optimized Expression Cache**: Stores optimized/simplified expressions
4. **Base Statement Cache**: Stores parsed base SQL before modifications
5. **Filter Cache**: Stores results of applying filters to statements

### Cache Keys

- SQL statements are cached based on:
- Raw SQL text
- Dialect
- Parameter styles
- Applied filters

### Performance Considerations

```python
from sqlspec.statement.cache import get_cache_stats, log_cache_stats

# Monitor cache performance
stats = get_cache_stats()
print(f"SQL Cache Hit Rate: {stats.sql_hit_rate:.2%}")
print(f"Fragment Cache Hit Rate: {stats.fragment_hit_rate:.2%}")

# Log detailed statistics
log_cache_stats()
```

## Best Practices

### 1. Development vs Production

```python
# Development: Disable caching for easier debugging
dev_config = SQLConfig(enable_caching=False)

# Production: Enable all caches with appropriate sizes
prod_config = CacheConfig(
sql_cache_size=5000,
fragment_cache_size=20000,
optimized_cache_size=5000,
)
```

### 2. Memory-Constrained Environments

```python
# Reduce cache sizes for low-memory environments
low_memory_config = CacheConfig(
sql_cache_size=100,
fragment_cache_size=500,
optimized_cache_size=100,
)
```

### 3. High-Performance Scenarios

```python
# Maximize cache sizes for read-heavy workloads
high_perf_config = CacheConfig(
sql_cache_size=10000,
fragment_cache_size=50000,
optimized_cache_size=10000,
)

# Pre-warm caches with common queries
common_queries = [
"SELECT * FROM users WHERE active = ?",
"SELECT id, name FROM products ORDER BY created_at DESC LIMIT ?",
]
for query in common_queries:
driver.execute(query, (True,)) # Warm up the cache
```

### 4. Selective Caching

```python
# Cache only expensive queries
def should_cache(sql: str) -> bool:
# Cache complex queries with JOINs, CTEs, or aggregations
complex_keywords = ["JOIN", "WITH", "GROUP BY", "HAVING"]
return any(keyword in sql.upper() for keyword in complex_keywords)

# Use conditional caching
sql = "SELECT COUNT(*) FROM orders JOIN users ON orders.user_id = users.id"
config = SQLConfig(enable_caching=should_cache(sql))
result = driver.execute(sql, _config=config)
```

## Monitoring and Debugging

```python
from sqlspec.statement.cache import (
get_cache_config,
get_cache_stats,
reset_cache_stats,
sql_cache,
ast_fragment_cache,
)

# Check current configuration
config = get_cache_config()
print(f"SQL Cache Size: {config.sql_cache_size}")
print(f"SQL Cache Enabled: {config.sql_cache_enabled}")

# Monitor cache usage
stats = get_cache_stats()
print(f"SQL Cache: {stats.sql_size}/{sql_cache.max_size} entries")
print(f"Hit Rate: {stats.sql_hit_rate:.2%}")

# Reset statistics for benchmarking
reset_cache_stats()

# Direct cache inspection (for debugging)
print(f"SQL Cache has {sql_cache.size} entries")
print(f"Fragment Cache has {ast_fragment_cache.size} entries")
```
97 changes: 97 additions & 0 deletions docs/cheat_sheet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# SQLSpec Cheat Sheet Documentation

This directory contains comprehensive reference documentation for SQLSpec development.

## Documents

### 1. [SQLSpec Architecture Guide](sqlspec-architecture-guide.md)

A comprehensive 700+ line guide covering:

- Complete architecture overview with single-pass pipeline
- Data flow from SQL to execution through three-tier caching
- All mixin implementations and their methods
- Pipeline system with SQLTransformContext and compose_pipeline
- Driver implementation patterns with correct signatures
- Parameter handling and type preservation
- Special cases (ADBC NULL, psycopg COPY, etc.)
- Testing and development workflows

### 2. [Quick Reference](quick-reference.md)

Essential patterns and commands including:

- Public API with full type signatures
- Driver method signatures (execute, execute_many, execute_script)
- Pipeline processing order with caching layers
- Type definitions and filters
- Parameter styles by database
- Common overrides and special cases
- DO's and DON'Ts
- Testing patterns

## Key Takeaways

### Method Signatures (CRITICAL)

All driver methods must match these exact signatures:

```python
def _execute_statement(
self,
statement: SQL,
connection: Optional[ConnectionT] = None,
**kwargs: Any
) -> SQLResult:
"""Main dispatcher"""

def _execute(
self,
sql: str,
parameters: Any,
statement: SQL,
connection: Optional[ConnectionT] = None,
**kwargs: Any
) -> SQLResult:
"""Single execution"""

def _execute_many(
self,
sql: str,
param_list: Any,
connection: Optional[ConnectionT] = None,
**kwargs: Any
) -> SQLResult:
"""Batch execution"""

def _execute_script(
self,
script: str,
connection: Optional[ConnectionT] = None,
**kwargs: Any
) -> SQLResult:
"""Script execution"""
```

### TypeCoercionMixin Is King

- **ALWAYS** use `_process_parameters()` for parameter extraction
- **NEVER** add custom parameter processing
- **ONLY** override specific `_coerce_*` methods when needed

### Golden Rules

1. **Trust the pipeline** - Single-pass processing handles complexity
2. **Parameters flow through context** - User → SQLTransformContext → Pipeline → Driver → Database
3. **Immutability** - Always return new instances
4. **AST over strings** - Use SQLGlot for SQL manipulation
5. **Leverage caching** - Three-tier system provides massive performance gains
6. **Use pipeline steps** - compose_pipeline() for custom transformations
7. **Test everything** - Especially parameter preservation and cache behavior

## When to Reference

- **Starting a new adapter**: Review the architecture guide
- **Debugging parameter issues**: Check quick reference DO's and DON'Ts
- **Adding features**: Ensure you're not reimplementing mixin functionality
- **Type errors**: Verify against the exact method signatures
Loading
Loading