Skip to content

maximum([fill(NaN, 255); missing]) !== maximum([fill(NaN, 256); missing]) (Should we fix reduce(max, ...) etc.?) #36287

Closed
@tkf

Description

@tkf

Currently, we have

julia> maximum([fill(NaN, 256); missing])
NaN

julia> maximum([fill(NaN, 255); missing])
missing

julia> maximum(x for x in [fill(NaN, 256); missing])
missing

julia> maximum(x for x in [fill(NaN, 255); missing])
missing

julia> max(missing, NaN)
missing

As you can see, maximum is incompatible with the definition of max depending on the size of the array. Furthermore, the result changes when processed with the identity iterator transformation.

I think maximum(xs) should be defined as reduce(max, xs) semantically. Currently, it is defined as such at the "syntax" level:

julia/base/reducedim.jl

Lines 716 to 725 in 13b07fc

for (fname, _fname, op) in [(:sum, :_sum, :add_sum), (:prod, :_prod, :mul_prod),
(:maximum, :_maximum, :max), (:minimum, :_minimum, :min)]
@eval begin
# User-facing methods with keyword arguments
@inline ($fname)(a::AbstractArray; dims=:, kw...) = ($_fname)(a, dims; kw...)
@inline ($fname)(f, a::AbstractArray; dims=:, kw...) = ($_fname)(f, a, dims; kw...)
# Underlying implementations using dispatch
($_fname)(a, ::Colon; kw...) = ($_fname)(identity, a, :; kw...)
($_fname)(f, a, ::Colon; kw...) = mapreduce(f, $op, a; kw...)

However, mapreduce implementation for max/min is not compatible with the mathematical definition of mapreduce.

There were related discussions on findmax/min and argmax/min (#27613, #35316) and I've been suggesting to define the total orderings compatible with max/min and use it to define all the similar functions, at least at the API definition level. I think this is the right direction as otherwise parallelizing these reducers is hard; i.e., we need to express them as folds with an associative operator which requires transitivity in the corresponding comparison. Furthermore, rigorously defining these reducers would let us confidently leverage the commutativity of max/min and define efficient maximum(::Diagonal) etc. #30236.

So, I suggest to:

  1. Use the mathematically correct definition in mapreduce(min, ...) and mapreduce(max, ...).
  2. Define the total ordering compatible with min (maybe in Add 2-arg versions of findmax/min, argmax/min #35316) and use it in argmin and findmin. (Note: isless is already max-compatible)

cc @nalimilan @StefanKarpinski @mbauman @JeffBezanson

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIndicates an unexpected problem or unintended behaviorcorrectness bug ⚠Bugs that are likely to lead to incorrect results in user code without throwingfoldsum, maximum, reduce, foldl, etc.minor changeMarginal behavior change acceptable for a minor releasemissing dataBase.missing and related functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions