Table of Contents
Caution
The project is very much in the pre-alpha stage. This is more of an experiment and is not meant for produciton workloads.
This is a concept for what a Rails-inspired small data platform for startups and SMEs could look like. After using a variety of end-to-end solutions like DOMO, Keboola, Mozart Data and others, I keep wishing there was something that would do the 80% of ELT + BI out-of-the-box, without the price surprises.
This project is an attempt to stitch together a set of solid and reliable open-source tools that combine into a lean platform where one data engineer can own the entire lifecycle. From ELT, to data modelling, to deploying and scaling in production.
-
🧪 From laptop to production in minutes - Develop locally with DuckDB, deploy with the same code. No more "it works on my machine" problems.
-
⚡ Lightning-fast analytics on any data size - DuckDB's column-oriented design handles gigabytes of data on modest hardware. Query billions of rows in seconds.
-
📊 Beautiful dashboards - Drag-and-drop dataviz with Metabase. Perfect for everyone - tech and non-tech alike.
-
💸 Scale without breaking the bank - Enterprise-grade data stack for as little as $30/month. DuckDB + SQLMesh's efficiency means less compute costs than Snowflake or BigQuery.
-
🔄 30+ ready-to-use integrations - Instant integrations with dlt for Stripe, GitHub, Salesforce, and more. Connect your SaaS tools with minimal code.
-
🤖 Just ask your DB - Ask questions in plain English with DuckDB's MCP. Get immediate answers without writing complex queries.
-
🔍 End-to-end data lineage - SQLMesh tracks transformations from raw to gold data. Understand exactly where metrics come from and debug easily.
- Local-first development for the entire stack.
- Support companies that can't afford heavy, expensive data tools or large teams.
- No "SSO tax" - all tools should be either fully free, or affordable once deployed in serious prod use case.
- No k8s, so a small data team can be self-sufficient .
- Cheap path to production and scaling.
- Extract (planned): dlt
- Transform: SQLMesh
- Data Storage: DuckDB
- BI / data viz: Metabase
- Deployment: Dokku
This is an example of how to list things you need to use the software and how to install them.
uv
mise
(recommended)claude
(recommended)
- Clone this repository
- Download the DuckDB driver for Metabase:
make download-duckdb-driver
- Start the services:
docker-compose up -d
- Access Metabase at http://localhost:3000
TODO
This project can be deployed to DigitalOcean using Dokku with the following architecture:
-
Metabase Container:
- Dedicated hostname (e.g., metabase.yourdomain.com)
- Access to mounted DuckDB volume
-
dlt + SQLMesh Container:
- Combined container for data processing
- Access to the same DuckDB volume
-
Shared Storage:
- DigitalOcean Volume for persistent DuckDB storage
- Create a Dokku-enabled droplet on DigitalOceana
Dokku docs: https://dokku.com/docs/getting-started/install/digitalocean.
-
Deploy using
app.json
configuration:# Clone the repository on your local machine git clone https://github.com/yourusername/kitsuna-data.git cd kitsuna-data # Add Dokku remote git remote add dokku dokku@your-droplet-ip:kitsuna-data # Push to Dokku - this will use the .do/app.yaml configuration git push dokku main
Dokku will automatically:
- Create the apps defined in
app.json
- Set up the specified resources
- Configure the mounts for shared storage
- Set up the domains
- Create the apps defined in
-
Set up SSL (recommended):
dokku letsencrypt:enable metabase
This deployment approach gives you:
- Separate containers for Metabase and data processing
- Shared persistent storage for DuckDB
- Simple deployment through Dokku
- Custom domain for Metabase
- Add SQLMesh
- Add MCP for DuckDB
- Add dlt
- Add Dokku deployment configuration
- Create a DigitalOcean box for a public demo
- Add installation docs
- Add usage docs
- Add Aider docs
Greg Goltsov - @gregoltsov, gregoltsov.bsky.social.
Here are some projects which inspired my thinking and this project: