A production-ready Docker template for Dagster Code Locations, providing a standardized starting point for implementing data pipelines. This repository serves as both a boilerplate and a reference implementation for Dagster code locations.
This repository is designed to work with sira-dagster-core, which provides the core Dagster infrastructure. While this repository contains your pipeline code, sira-dagster-core manages the Dagster webserver, daemon, and core infrastructure.
This repository represents one half of a distributed Dagster setup:
-
Core Infrastructure (sira-dagster-core):
- Dagster webserver and daemon processes
- Core infrastructure configuration
- Base monitoring and scheduling
- Database and storage management
-
Code Location (This Repository):
- Contains actual pipeline definitions
- Separate deployment lifecycle
- Independent versioning
- Flexible scaling options
This repository provides:
-
Base Docker Image:
- Pre-configured with common data processing dependencies
- Optimized for Dagster code locations
- Multi-architecture support (amd64/arm64)
-
Development Template:
- Standard project structure
- Development tooling configuration
- Testing framework setup
- CI/CD workflows
-
Kitchen Sink Approach:
- Comprehensive set of data processing libraries
- Common utilities pre-installed
- Ready-to-use configurations
- Data Processing: pandas, numpy, pyarrow
- ETL Tools: dlt, dagster-dbt
- Databases: dagster-postgres, clickhouse-connect
- Cloud: dagster-aws
- Utilities: pydantic, python-dotenv
-
First, ensure you have the core infrastructure running:
- Deploy sira-dagster-core following its setup instructions
- Note the Dagster webserver URL and any required configuration
-
Clone this template:
git clone https://github.com/[your-username]/dagster-code-location-template
- Configure your environment:
cp .env.example .env
# Edit .env with your specific configuration and core instance details
- Start development server:
docker-compose -f docker-compose-dev.yml up
- Access your code location through the Dagster UI at
http://localhost:3000
Key environment variables (defined in .env
):
DAGSTER_POSTGRES_*
: PostgreSQL connection details (should match core instance)GRPC_PORT
: Port for the code location serverPIPELINE_NAME
: Name of your pipelineWORKING_DIRECTORY
: Pipeline working directoryDAGSTER_CORE_URL
: URL of your sira-dagster-core instance
The template supports multiple code locations:
# workspace.yaml example
load_from:
- grpc_server:
host: localhost
port: 4000
location_name: "location_1"
The base image includes:
- Python 3.12
- Common data processing libraries
- Dagster dependencies
- Development tools
docker buildx bake -f docker-bake.hcl
- Environment-based configuration
- No hardcoded credentials
- Regular dependency updates
- Multi-architecture support
- Create a new branch for your pipeline:
git checkout -b feature/my-pipeline
- Implement your pipeline in the appropriate code location directory
- Test locally using docker-compose
- Submit a PR following our guidelines