-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Open
Labels
foldsum, maximum, reduce, foldl, etc.sum, maximum, reduce, foldl, etc.statisticsThe Statistics stdlib moduleThe Statistics stdlib module
Description
Is there a reason that we can't have a more numerically stable streaming mean
method for arbitrary iterators? Here's a sketch of an implementation:
function streaming_mean(itr)
a, rest = Iterators.peel(itr)
for (i, x) in enumerate(rest)
a += (x - a) / (i + 1)
end
return a
end
And here's a demonstration of the numerical performance:
julia> using Statistics
julia> A = rand(Float32, 100_000_000);
julia> itr = (rand(Float32) for _ in 1:100_000_000);
julia> mean(A), mean(itr), streaming_mean(itr)
(0.4999884f0, 0.16777216f0, 0.49996638f0)
(I would imagine that there are some tweaks to be made to optimize the performance of streaming_mean
, but I hope by now we've learned that correctness (i.e. numerical accuracy) is more important than performance.)
PetrKryslUCSD and tpgillam
Metadata
Metadata
Assignees
Labels
foldsum, maximum, reduce, foldl, etc.sum, maximum, reduce, foldl, etc.statisticsThe Statistics stdlib moduleThe Statistics stdlib module