Skip to content

Metric queries return null data until whisper file exists #629

@bitprophet

Description

@bitprophet

Scenario

  • You have a large Graphite install and/or are just stuck with slow disks, such that data enters carbon-cache and then doesn't reach disk for a number of minutes.
  • You add new metric paths to your system that didn't exist before - new hosts or services came online, you renamed some metrics in your collectors, whatever.
  • You are querying Graphite-web for these new metrics directly (aka bypassing the 'finder') via direct HTTP calls, or a dashboard, etc.
  • Those queries result in no data until the above to-disk lag period passes.
  • Users get sad because they made all the changes on their end (re: data collection) but then they have to wait a long time for validation/usefulness.

Cause

Querying the webapp for a specific metric path returns no data until it hits disk even if data is in the cache, due to the short-circuit linked here.

Solution

Update readers.py such that it performs a best-effort check of the cache for the requested metric path before giving up and returning None.

Downsides/challenges

  • This may incur a (hopefully minor!) performance hit compared to existing behavior, in the case of truly-bogus metric path queries, but feels quite worth it, at least in my use case where lag is 20-30 minutes at times.
  • Unless Carbon/CarbonLink is capable of servicing glob-expression requests (guessing not, given the abovelinked code) this will only solve things for fully explicit queries. Still useful some of the time, but not a full solution.
    • EDIT: Yup, the query is straight-up keying into the carbon-cache's cache dict. Could probably shoehorn in a way to handle globs, however, if core team thinks that's a reasonable thing to do.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions