Guidance on estimating scheduler storage requrements

**What content needs to be created or modified?**

The Dapr documentation lacks guidance on estimating the required storage size for the Scheduler service's embedded etcd database, leading to issues where the default `1GB` or recommended `16GB` storage fills up, especially for workloads with a high number of reminders. A new section is needed to provide a methodology for estimating storage needs that accounts for the total number of active jobs and reminders, the size of their data payloads, and the storage overhead from etcd's internal operations (like its Write-Ahead Log and snapshots).

**Describe the solution you'd like**

A new documentation section titled **Estimating Scheduler Storage Requirements** should be added to the Dapr documentation. This section will:
- Explain the factors affecting etcd storage usage (e.g., number of jobs/reminders, payload size, etcd’s Write-Ahead Logs (WAL) and snapshots).
- Provide a simplified formula for estimating storage needs, similar to [Prometheus](https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects)' `needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample` , adapted for Dapr Scheduler (e.g., needed_disk_space =  jobs_per_second * average_job_size).
- Include a note that exact sizing requires benchmarking due to variability in workloads.
- Offer practical examples for calculating storage for common use cases (e.g., high reminder workloads, workflows with small payloads).
- Highlight the impact of etcd’s WAL and snapshots on storage growth.
- Provide guidance on monitoring storage usage and adjusting configuration compaction (if configurable).

**Where should the new material be placed?**

TBD but it could be added in:
- `/operations/hosting/kubernetes/kubernetes-persisting-scheduler/` 
- `/operations/hosting/kubernetes/kubernetes-production/` 
- or a new section in `/operations/hosting/kubernetes` 

**Additional context**

There has been an increase in users struggling to capacity plan their Scheduler storage requirements, as resizing Persistent Volume Claims (PVCs) for the Scheduler’s etcd database when they fill up is operationally challenging and costly. This is multi-step procedure to delete and recreate the Scheduler’s StatefulSet, as detailed in [Increase Scheduler Storage Size](https://docs.dapr.io/operations/hosting/kubernetes/kubernetes-persisting-scheduler/#increase-existing-scheduler-storage-size). 

For storage providers that do not support dynamic expansion, an re-install of the Dapr control plane is required to recreate the StatefulSet with a new persistent volume, significantly increasing the risk of downtime if not carefully managed.  Proactive storage estimation and monitoring is essential to avoid this. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidance on estimating scheduler storage requrements #4829

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidance on estimating scheduler storage requrements #4829

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions