-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Closed
Copy link
Description
Scenario
- You have a large Graphite install and/or are just stuck with slow disks, such that data enters carbon-cache and then doesn't reach disk for a number of minutes.
- You add new metric paths to your system that didn't exist before - new hosts or services came online, you renamed some metrics in your collectors, whatever.
- You are querying Graphite-web for these new metrics directly (aka bypassing the 'finder') via direct HTTP calls, or a dashboard, etc.
- Those queries result in no data until the above to-disk lag period passes.
- Users get sad because they made all the changes on their end (re: data collection) but then they have to wait a long time for validation/usefulness.
Cause
Querying the webapp for a specific metric path returns no data until it hits disk even if data is in the cache, due to the short-circuit linked here.
Solution
Update readers.py
such that it performs a best-effort check of the cache for the requested metric path before giving up and returning None
.
Downsides/challenges
- This may incur a (hopefully minor!) performance hit compared to existing behavior, in the case of truly-bogus metric path queries, but feels quite worth it, at least in my use case where lag is 20-30 minutes at times.
- Unless Carbon/CarbonLink is capable of servicing glob-expression requests (guessing not, given the abovelinked code) this will only solve things for fully explicit queries. Still useful some of the time, but not a full solution.
- EDIT: Yup, the query is straight-up keying into the carbon-cache's cache dict. Could probably shoehorn in a way to handle globs, however, if core team thinks that's a reasonable thing to do.