DocFusionDB is an experimental document database built in Rust that combines PostgreSQL's high‑performance JSONB storage with Apache Arrow's DataFusion query engine. By using custom User-Defined Functions (UDFs), DocFusionDB enables fast SQL queries over JSON documents while maintaining flexibility.
- 🚀 HTTP API: RESTful API for document operations and custom queries
- 📊 JSONB Document Storage: Store rich JSON documents in PostgreSQL with GIN indexing
- ⚡ DataFusion Integration: Fast SQL queries with custom JSON UDFs
- ⚡ Smart Query Caching: In-memory cache with LRU eviction for instant query responses
- 📊 System Metrics: Built-in monitoring with performance and usage statistics
- 📦 Bulk Operations: Efficient bulk document insertion (up to 1000 docs)
- 🔌 Connection Pooling: Production-ready database connection management
- 📝 Structured Logging: JSON logging with performance metrics
- 🧪 CLI Interface: Command-line tools for development and testing
- 💾 Backup/Restore: Simple backup and restore functionality
- 🔐 API Authentication: Optional API key authentication for security
- ⚙️ Flexible Configuration: YAML config with environment variable support
- PostgreSQL 15+: As the storage backend
- Rust & Cargo: For building from source
# Create database and table
createdb docfusiondb
psql docfusiondb -c "
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
doc JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_documents_gin ON documents USING GIN (doc);
"
Create config.yaml
:
database:
host: localhost
port: 5432
user: postgres
password: yourpassword
database: docfusiondb
server:
host: 0.0.0.0
port: 8080
auth:
enabled: true
api_key: "your-secret-api-key"
logging:
level: info
format: json
Or use environment variables:
export DATABASE_URL="postgres://postgres:yourpassword@localhost:5432/docfusiondb"
export AUTH_ENABLED=true
export API_KEY="your-secret-api-key"
# Clone and build
git clone https://github.com/arrowpeak/docfusiondb.git
cd docfusiondb
cargo build --release
# Start HTTP API server
cargo run -- serve
The API will be available at http://localhost:8080
curl -X POST "http://localhost:8080/documents" \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret-api-key" \
-d '{
"document": {
"title": "My Article",
"content": "Article content here",
"tags": ["rust", "database"],
"metadata": {
"author": "Developer",
"published": true
}
}
}'
curl -X POST "http://localhost:8080/documents/bulk" \
-H "Content-Type: application/json" \
-d '{
"documents": [
{"title": "Doc 1", "content": "First document"},
{"title": "Doc 2", "content": "Second document"},
{"title": "Doc 3", "content": "Third document"}
]
}'
curl "http://localhost:8080/documents?limit=10&offset=0" \
-H "X-API-Key: your-secret-api-key"
curl "http://localhost:8080/documents/1"
curl -X POST "http://localhost:8080/query" \
-H "Content-Type: application/json" \
-d '{
"sql": "SELECT json_extract_path(doc, '\''title'\'') as title FROM documents WHERE json_extract_path(doc, '\''published'\'') = '\''true'\''"
}'
curl "http://localhost:8080/health" | jq '.data.cache'
curl "http://localhost:8080/metrics" | jq
# Insert a document
cargo run -- insert '{"title":"Test","content":"Hello World"}'
# Query documents
cargo run -- query "SELECT json_extract_path(doc, 'title') as title FROM documents"
# Update a document
cargo run -- update 1 '{"title":"Updated","content":"New content"}'
# Backup documents to JSON file
cargo run -- backup --output my_backup.json
# Restore documents from JSON file
cargo run -- restore --input my_backup.json
# Restore and clear existing documents first
cargo run -- restore --input my_backup.json --clear
DocFusionDB provides custom UDFs for JSON operations:
Extract a value from a JSON document:
SELECT json_extract_path(doc, 'title') as title FROM documents
Check if document contains key-value pairs:
SELECT * FROM documents WHERE json_contains(doc, '{"status": "published"}')
Multi-key containment check:
SELECT * FROM documents WHERE json_multi_contains(doc, '{"type": "article", "status": "published"}')
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ HTTP API │ │ DataFusion │ │ PostgreSQL │
│ (Axum) │───▶│ Query Engine │───▶│ JSONB Storage │
│ + Query Cache │ │ + Custom UDFs │ │ + GIN Indexes │
└─────────────────┘ └─────────────────┘ └─────────────────┘
DataFusion handles query planning and optimization, while PostgreSQL provides durable JSONB storage with efficient indexing. Custom UDFs bridge the gap, translating JSON operations into optimized PostgreSQL queries. Smart caching provides instant responses for repeated queries.
# Run unit tests
cargo test
# Test the API (requires running server)
./test_api.sh
A web-based demo showcasing all DocFusionDB features:
# Start the server with demo auth
export AUTH_ENABLED=true
export API_KEY=demo-api-key
cargo run -- serve
# Open the demo (in another terminal)
cd demo
python3 -m http.server 8000
# Visit http://localhost:8000
The demo includes:
- Interactive document creation and querying
- Real-time metrics dashboard
- Bulk operations testing
- Query caching demonstration
- Hardware: Apple M3 Pro (ARM64), 32GB Memory
- Rust Version: 1.84.0 (Release mode with optimizations)
cargo bench
Operation | Latency | Throughput |
---|---|---|
JSON Serialize | 468.78 ns | 2.1M ops/sec |
JSON Deserialize | 1.15 µs | 869k ops/sec |
Cache Put | 1.06 µs | 943k ops/sec |
Cache Get (Hit) | 816.05 ns | 1.2M ops/sec |
Cache Get (Miss) | 89.84 ns | 11.1M ops/sec |
# HTTP API benchmarks (requires running server)
./scripts/run_benchmarks.sh
Operation | Throughput | Avg Latency | P95 Latency |
---|---|---|---|
Single Document Create | 2,500 req/sec | 4ms | 12ms |
Document Listing | 5,000 req/sec | 2ms | 8ms |
Simple JSON Queries | 1,800 req/sec | 5ms | 15ms |
Cached Query Hits | 8,000 req/sec | 1ms | 3ms |
Bulk Insert (50 docs) | 400 req/sec | 25ms | 50ms |
- Zero-Copy JSON Processing: Minimal allocation overhead
- Smart Query Caching: 85-95% cache hit ratio in typical workloads
- Vectorized Execution: DataFusion's columnar processing
- Memory Efficient: ~50MB base memory usage
See scripts/README.md
for detailed benchmarking guide.
DocFusionDB supports multiple configuration methods with the following precedence:
- Command-line arguments
config.yaml
file- Environment variables
- Defaults
DATABASE_URL
- PostgreSQL connection stringDB_HOST
,DB_PORT
,DB_USER
,DB_PASSWORD
,DB_NAME
- Database connection detailsSERVER_HOST
,SERVER_PORT
- Server binding configurationAUTH_ENABLED
- Enable/disable API authentication (true/false)API_KEY
- API key for authenticationLOG_LEVEL
- Logging level (debug, info, warn, error)
- ✅ Phase 1: Foundation (Error handling, Config, Connection pooling, Logging, Tests)
- ✅ Phase 2: HTTP API Server with bulk operations
- ✅ Phase 3: Performance optimizations and smart caching
- ✅ Phase 4: Production polish (Auth, Monitoring, Backup)
- ✅ Phase 5: Demo application and performance benchmarking
- 📋 Phase 6: Advanced features (Transactions, Schema validation)
DocFusionDB is experimental and welcomes contributions! Areas of interest:
- Additional JSON query functions and UDFs
- Transaction support and ACID guarantees
- Schema validation and data constraints
- Advanced caching strategies
- Integration with other query engines
- Documentation and examples
MIT License - see LICENSE for details.