-
Notifications
You must be signed in to change notification settings - Fork 772
Description
What content needs to be created or modified?
The Dapr documentation lacks guidance on estimating the required storage size for the Scheduler service's embedded etcd database, leading to issues where the default 1GB
or recommended 16GB
storage fills up, especially for workloads with a high number of reminders. A new section is needed to provide a methodology for estimating storage needs that accounts for the total number of active jobs and reminders, the size of their data payloads, and the storage overhead from etcd's internal operations (like its Write-Ahead Log and snapshots).
Describe the solution you'd like
A new documentation section titled Estimating Scheduler Storage Requirements should be added to the Dapr documentation. This section will:
- Explain the factors affecting etcd storage usage (e.g., number of jobs/reminders, payload size, etcd’s Write-Ahead Logs (WAL) and snapshots).
- Provide a simplified formula for estimating storage needs, similar to Prometheus'
needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample
, adapted for Dapr Scheduler (e.g., needed_disk_space = jobs_per_second * average_job_size). - Include a note that exact sizing requires benchmarking due to variability in workloads.
- Offer practical examples for calculating storage for common use cases (e.g., high reminder workloads, workflows with small payloads).
- Highlight the impact of etcd’s WAL and snapshots on storage growth.
- Provide guidance on monitoring storage usage and adjusting configuration compaction (if configurable).
Where should the new material be placed?
TBD but it could be added in:
/operations/hosting/kubernetes/kubernetes-persisting-scheduler/
/operations/hosting/kubernetes/kubernetes-production/
- or a new section in
/operations/hosting/kubernetes
Additional context
There has been an increase in users struggling to capacity plan their Scheduler storage requirements, as resizing Persistent Volume Claims (PVCs) for the Scheduler’s etcd database when they fill up is operationally challenging and costly. This is multi-step procedure to delete and recreate the Scheduler’s StatefulSet, as detailed in Increase Scheduler Storage Size.
For storage providers that do not support dynamic expansion, an re-install of the Dapr control plane is required to recreate the StatefulSet with a new persistent volume, significantly increasing the risk of downtime if not carefully managed. Proactive storage estimation and monitoring is essential to avoid this.