WAL-Cake is a blazingly fast, production-grade Change Data Capture (CDC) service that streams events from PostgreSQL to an S3 data lake in Parquet format. It leverages PostgreSQL's logical replication to capture database changes in real-time and efficiently processes them for analytics and data warehousing.
- Real-time CDC: Capture INSERT, UPDATE, DELETE, and COMMIT operations from PostgreSQL WAL
- Optimized Parquet Writing: Efficiently converts CDC events to columnar Parquet format with ZSTD compression (level 3)
- Reliable LSN Management: Properly acknowledges LSN positions only after successful S3 uploads
- Event Filtering: Configurable filtering of events (e.g., excluding commit events from Parquet files)
- Memory Efficient: Minimizes memory allocations and optimizes for high throughput
- Production Ready: Resilient error handling, proper logging, and performance optimized
WAL-Cake follows SOLID principles and is built with a clean, modular architecture:
- Replication Layer (
internal/replication
): Handles PostgreSQL logical replication usingpglogrepl
- Transform Layer (
internal/transform
): Converts CDC events to Parquet format - Storage Layer (
internal/storage
): Manages uploads to S3 data lake - Model Layer (
internal/model
): Defines core data structures - Config Layer (
internal/config
): Manages application configuration
- Go 1.21+
- PostgreSQL 13+ with logical replication enabled
- AWS S3 bucket or compatible storage service
WAL-Cake is configured via environment variables:
# PostgreSQL Configuration
PG_CONN_STRING=postgres://user:password@localhost:5432/dbname
PG_SLOT=wal_cake_slot
PG_PUBLICATION=wal_cake_pub
# S3 Configuration
S3_BUCKET_NAME=your-data-lake-bucket
NAMESPACE=your-namespace
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_ENDPOINT=https://s3.amazonaws.com (optional for S3-compatible services)
- postgres
- minio / s3
docker compose up -d
./run.sh
-
Enable logical replication in
postgresql.conf
:wal_level = logical
-
Create a publication:
CREATE PUBLICATION wal_cake_pub FOR ALL TABLES;
-
Create a replication slot:
SELECT pg_create_logical_replication_slot('wal_cake_slot', 'pgoutput');
WAL-Cake uses zerolog with logfmt format for structured logging. Key metrics logged include:
- CDC event counts and types
- LSN positions and acknowledgments
- Parquet file sizes and compression ratios
- Error conditions and recovery actions
wal-cake/
├── cmd/
│ └── cake/ # Main application entry point
├── internal/
│ ├── config/ # Configuration handling
│ ├── model/ # Data models
│ ├── replication/ # PostgreSQL replication logic
│ ├── storage/ # S3 storage interface
│ └── transform/ # Parquet transformation logic
├── docker-compose.yml # Local development environment
└── run.sh # Convenience script for running the application