Skip to content

Guidance on estimating scheduler storage requrements #4829

@MyMirelHub

Description

@MyMirelHub

What content needs to be created or modified?

The Dapr documentation lacks guidance on estimating the required storage size for the Scheduler service's embedded etcd database, leading to issues where the default 1GB or recommended 16GB storage fills up, especially for workloads with a high number of reminders. A new section is needed to provide a methodology for estimating storage needs that accounts for the total number of active jobs and reminders, the size of their data payloads, and the storage overhead from etcd's internal operations (like its Write-Ahead Log and snapshots).

Describe the solution you'd like

A new documentation section titled Estimating Scheduler Storage Requirements should be added to the Dapr documentation. This section will:

  • Explain the factors affecting etcd storage usage (e.g., number of jobs/reminders, payload size, etcd’s Write-Ahead Logs (WAL) and snapshots).
  • Provide a simplified formula for estimating storage needs, similar to Prometheus' needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample , adapted for Dapr Scheduler (e.g., needed_disk_space = jobs_per_second * average_job_size).
  • Include a note that exact sizing requires benchmarking due to variability in workloads.
  • Offer practical examples for calculating storage for common use cases (e.g., high reminder workloads, workflows with small payloads).
  • Highlight the impact of etcd’s WAL and snapshots on storage growth.
  • Provide guidance on monitoring storage usage and adjusting configuration compaction (if configurable).

Where should the new material be placed?

TBD but it could be added in:

  • /operations/hosting/kubernetes/kubernetes-persisting-scheduler/
  • /operations/hosting/kubernetes/kubernetes-production/
  • or a new section in /operations/hosting/kubernetes

Additional context

There has been an increase in users struggling to capacity plan their Scheduler storage requirements, as resizing Persistent Volume Claims (PVCs) for the Scheduler’s etcd database when they fill up is operationally challenging and costly. This is multi-step procedure to delete and recreate the Scheduler’s StatefulSet, as detailed in Increase Scheduler Storage Size.

For storage providers that do not support dynamic expansion, an re-install of the Dapr control plane is required to recreate the StatefulSet with a new persistent volume, significantly increasing the risk of downtime if not carefully managed. Proactive storage estimation and monitoring is essential to avoid this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions