cbits: avoid div-by-zero, and thus NaN

thoughtpolice · thoughtpolice · commit ddd393446df3 · 2019-12-02T13:24:37.000-06:00
A newly-created a `Distribution` has a flaw: its `mean` value is set to
`NaN` by default upon the first reading, providing no samples have been
previously added.

Why? Because a newly created `Distribution` named `d` has a field
`d-&gt;count = 0`, which is used as a divisor for a float value. And the
numerator ends up zero too, as well. IEEE-754 defines the value of
0.0/0.0 as NaN, and so on the first reading of a `Distribution` with no
samples, `NaN` is returned for the `mean`.

This is problematic for a use case of mine: I want to use `ekg-statsd`
to export `Distribution` values a metric logging system. However,
without this patch, the `mean` value is reported as `NaN` (thanks to its
`Show` instance), which causes the logging system to reject the metric
because it strictly expects floating-point values.

I wouldn't be surprised if other systems rejected `NaN` in such a case
when they try to scrape metrics. And it's not something clients of
`ekg-core` should really check for.

In this case, the fix is a little simple: we just check for `count == 0`
and return a mean of `0.0` if that's the case.

Signed-off-by: Austin Seipp &lt;aseipp@pobox.com&gt;
diff --git a/cbits/distrib.c b/cbits/distrib.c
@@ -35,7 +35,7 @@ void hs_distrib_combine(struct distrib* b, struct distrib* a) {
   const StgDouble sum_sq_delta = (a->sum_sq_delta + b->sum_sq_delta +
                                   delta * delta * (a->count * b->count) / count);
   a->count = count;
-  a->mean = mean;
+  a->mean = (count == 0) ? 0.0 : mean; // divide-by-zero gives NaN
   a->sum_sq_delta = sum_sq_delta;
   a->sum = a->sum + b->sum;
   a->min = b->min;