Skip to content

More accurate streaming mean method for arbitrary iterators #52365

@CameronBieganek

Description

@CameronBieganek

Is there a reason that we can't have a more numerically stable streaming mean method for arbitrary iterators? Here's a sketch of an implementation:

function streaming_mean(itr)
    a, rest = Iterators.peel(itr)
    for (i, x) in enumerate(rest)
        a += (x - a) / (i + 1)
    end
    return a
end

And here's a demonstration of the numerical performance:

julia> using Statistics

julia> A = rand(Float32, 100_000_000);

julia> itr = (rand(Float32) for _ in 1:100_000_000);

julia> mean(A), mean(itr), streaming_mean(itr)
(0.4999884f0, 0.16777216f0, 0.49996638f0)

(I would imagine that there are some tweaks to be made to optimize the performance of streaming_mean, but I hope by now we've learned that correctness (i.e. numerical accuracy) is more important than performance.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    foldsum, maximum, reduce, foldl, etc.statisticsThe Statistics stdlib module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions