Correctness issues with `minimum` and `maximum`

I have some comments regarding correctness issues with `minimum` and `maximum`. One is about handling of signed zero for floating points and the other about short circuiting for `NaN` and `missing`. I believe the current implementation favours performance over correctness, something I think one should be extremely careful with.

Before discussing the issues I want to give some context about the `minimum` and `maximum` functions and why I think it is extra important that these methods are correctly implemented. The call `minimum(A::AbstractArray)` eventually calls `mapreduce(identity, min, A)`, similarly `minimum(f, A)` eventually calls `mapreduce(f, min, A)`. After some more indirection this calls `Base._mapreduce(f, min, IndexLinear(), A)`, this method has some special cases if the length of `A` is small, more precisely it handles `length(A) < 16` directly, otherwise it calls `Base.mapreduce_impl(f, min, A, firstindex(A), lastindex(A))`. So far everything uses the default implementations of the reduction functions, there is no special handling of the `min` argument, the `Base.mapreduce_impl` function does however have a specialised method for `min` and `max` found [here](https://github.com/JuliaLang/julia/blob/ae55fb22168cc5c0496b3a12217c84e7558dbb43/base/reduce.jl#L628).

The above indirection leads to three potential issues
1. The implementation used for the reduction depends on the length of the vector. This makes it easy to miss bugs or correctness issues since it is common to only write tests for short vectors.
2. It is impossible to dispatch on the return type of `f`. This means that if `Base.mapreduce_impl(f, min, A, firstindex(A), lastindex(A))` makes some optimizations that are not valid for your custom type `T` there is no way to handle this. For most Julia functions you can create a custom method by dispatching on your type, but here this is not possible. This calls for being extra careful that the implementation is correct in all possible cases.
3. It might be possible to document any possible correctness issues for `minimum` and `maximum`. However, since they call `mapreduce(f, min, A)` you would get the same correctness issues if you for example call `reduce(min, A)` and it is not really possible to add documentation to `reduce` and `mapreduce` about special handing of `min` and `max`. Again this calls for being extra careful with correctness since it is not possible to document it fully.

Now to the actual issues. I give all examples using `minimum` but `maximum` has exactly the same issues. Whenever I refer to timings it is for a AMD Ryzen 9 5900X.

# Handling of signed zeros
The main optimization done in the implementation of `Base.mapreduce_impl(f, min, A, firstindex(A), lastindex(A))` is in the way it handles signed zeros. It calls `_fast(min, x, y)` which is defined by

``` julia
_fast(::typeof(min),x,y) = min(x,y)
function _fast(::typeof(min),x::AbstractFloat, y::AbstractFloat)
    ifelse(isnan(x),
        x,
        ifelse(x < y, x, y))
end
```

so in general it calls `min` directly and there is nothing to discuss. But if `x` and `y` are floats it uses a simplified version of `min` which doesn't take into account the ordering of `-0.0` and `0.0`. To still enforce the ordering of the signed zeros it ends with

``` julia
# enforce correct order of 0.0 and -0.0
# e.g. maximum([0.0, -0.0]) === 0.0
# should hold
if isbadzero(op, v)
    for i in first:last
        x = @inbounds A[i]
        isgoodzero(op,x) && return x
    end
end
```

However the issue with this code is that it doesn't apply the function `f` to `x` before comparison, so it only works correctly for `f = identity` (or as long as `f` preserves the sign I guess). For example we have

```julia
julia> minimum(abs, fill(-0.0, 16))
-0.0
julia> minimum(abs.(fill(-0.0, 16)))
0.0
```

Note that if you give fewer than 16 elements it uses another implementation and you get a correct result (so the example `maximum([0.0, -0.0]) === 0.0` in the comment doesn't actually use the code below the comment), which makes it harder to track down this issue.

```julia
julia> minimum(abs, fill(-0.0, 15))
0.0
```

What I can tell there is no simple fix for this which doesn't sacrifice performance. One possibility is to add the application of `f` to the check for `-0.0`. It would give correct results but would mean applying `f` twice on each element, which I don't think is an option. The most "correct" solution would be to avoid using the `_fast` method and just use `min` directly. However this comes at a fairly large performance penalty compared to the current implementation, about a 100% increasing in runtime for `Float64` in some benchmarks I did. Another, slightly more involved, solution is to special case on `f = identity` and apply this optimization for that case but not otherwise.

Personally I would lean to the correct but slower solution, avoid using `_fast` and call `min` directly. But I don't know what type of performance regressions people are willing to accept.

Lastly I can also mention that the current version gives a quite noticeable performance penalty if the minimum happens to be `0.0`, since the array is traversed twice.

```julia
julia> A1 = rand(10^6);
julia> A2 = [0.0; rand(10^6 - 1)];
julia> @btime minimum($A1)
  174.130 μs (0 allocations: 0 bytes)
1.9306038734345776e-7
julia> @btime minimum($A2)
  412.420 μs (0 allocations: 0 bytes)
0.0
```

# Short circuiting for `NaN` and `missing`
The implementation has a short circuit meant for when it encounters `NaN` or `missing`. The relevant code is

``` julia
while simdstop <= last - 3
    # short circuit in case of NaN or missing
    (v1 == v1) === true || return v1
    (v2 == v2) === true || return v2
    (v3 == v3) === true || return v3
    (v4 == v4) === true || return v4
    @inbounds for i in start:4:simdstop
        v1 = _fast(op, v1, f(A[i+0]))
        v2 = _fast(op, v2, f(A[i+1]))
        v3 = _fast(op, v3, f(A[i+2]))
        v4 = _fast(op, v4, f(A[i+3]))
    end
    checkbounds(A, simdstop+3)
    start += chunk_len
    simdstop += chunk_len
end
```

The benefit of this is that we can avoid looking through the whole vector in some cases. However it has some issues. The main issue is that it short circuits on BOTH `NaN` and `missing` and hence can return `NaN` in cases where it would be more correct to return `missing`, exactly which one is returned is also slightly involved which makes it very hard to test for this case. Some examples

``` julia
julia> # Result depends on order of NaN and missing
julia> A1 = vcat([NaN, missing], fill(0.0, 255));
julia> minimum(A1)
NaN
julia> A2 = vcat([missing, NaN], fill(0.0, 255));
julia> minimum(A2)
missing
julia> # Result depends on length of vector
julia> A3 = vcat([0.0, 0.0, missing, 0.0, 0.0, NaN], fill(0.0, 506));
julia> minimum(A3)
missing
julia> A4 = vcat([0.0, 0.0, missing, 0.0, 0.0, NaN], fill(0.0, 507));
julia> minimum(A4)
NaN
```

Here I think the obvious solution is to just remove this short circuiting. As mentioned in #35939 it seems unlikely that this is used in the hot path in many cases. For the vast majority of cases this should not lead to any performance reductions, it might even lead to very slight performance improvements. If one really wants to keep this behaviour then one option could be to only short circuit on `missing`, in which case the behaviour would be correct. If one really wants to I guess it might be possible to check if the return type of `f` contains `missing` or not and short circuit on `NaN` if it does not. In that case the check should be updated to use `isnan`. But as I said I would opt towards just removing the short circuiting altogether.

# Behaviour for other types
Finally I wanted to give some observations on the behaviour for other types. The reason I took a look at this function from the beginning was due to issues I encountered when developing [Arblib.jl](https://github.com/kalmarek/Arblib.jl).

For `AbstractFloat` it uses `ifelse(isnan(x), x, ifelse(x < y, x, y))` instead of `min` directly. Except for not handling signed zeros this is equivalent for `Float64` (and most other `AbstractFloat` in Base). For `BigFloat` it has the benefit of not allocating (which `min` does). In fact I don't know why `min(x::BigFloat, y::BigFloat)` is written to allocate a new variable for the result, in contrast the implementation of `min(x::BigInt, y::BigInt)` does not allocate a new variable.

My issue with using this approach instead of `min` directly is that it is not valid for the `Arb` type that occurs in Arblib.jl. The `Arb` type can slightly simplified be described as a floating point with some extra information keeping track of rounding errors, when taking the minimum these rounding errors have to be taken into account and this is not done with the above check. Now this might very well be my own fault for subtyping `AbstractFloat` even though the type doesn't satisfy the assumptions assumed by Julia, but in many cases it is very natural to treat an `Arb` as a floating point and see the tracking of the rounding errors as a bonus. In most cases you can then just dispatch on the `Arb` type when you need some different implementation, but as mentioned in the beginning this is not possible for `minimum`.

The second observation is related to the current implementation of the short circuiting. It currently assumes that `(x == x) === true` is false only for `NaN` and `missing`. While this is true for all types in base Julia I don't think it should be assumed for custom types. For example the `Arb` type mentioned above doesn't satisfy this assumption (`x == x` is false if the rounding error is non-zero). It would be better to be more explicit and instead use something along the like of`(x isa Number && isnan(x)) || ismissing(x)` as mentioned in #35989, this should also be slightly faster in many cases. Though my stance would be that the short circuiting should be removed entirely.

# What to do now?
I would be happy to prepare a PR to handle some of these issues. But exactly what approach to take depends on how much performance one is willing to sacrifice or how much code complexity one is willing to add. Below are some possible solutions, I can add more benchmark information if it is of interest.

The simplest possible implementation which handles all of the above mentioned issues would be

``` julia
function mapreduce_impl(
    f,
    op::Union{typeof(max),typeof(min)},
    A::AbstractArrayOrBroadcasted,
    first::Int,
    last::Int,
)
    @inbounds v1 = mapreduce_first(f, op, A[first])
    v2 = v3 = v4 = v1

    @inbounds indices = (first+1):4:(last-3)
    for i = indices
        v1 = op(v1, f(A[i+0]))
        v2 = op(v2, f(A[i+1]))
        v3 = op(v3, f(A[i+2]))
        v4 = op(v4, f(A[i+3]))
    end

    v = op(op(v1, v2), op(v3, v4))
    @inbounds for i = (indices[end]+3):last
        v = op(v, f(A[i]))
    end

    return v
end
```

The drawbacks of this approach is no short circuiting and worse performance for floating points. For `Float64` it is about 100% slower. For `BigFloat` it is about 300% slower due to `min` allocating whereas the previous version didn't allocate. The performance regression for `BigFloat` I think is acceptable, if you want to make it faster you can use [MutableArithmetic.jl](https://github.com/jump-dev/MutableArithmetics.jl). The performance penalty for `Float64` I'm not sure if is okay or not.

A solution which solves some of the issues but doesn't pay much in terms of performance would be to replace the `op` in the above implementation with the `_fast` version. This would mean that it doesn't handle signed zeros correctly but at least it handles `NaN` and `missing` correctly. The only performance regression would be the removal of the short circuiting. In this case a note should probably be added to `minimum` about not handling signed zeros.

Finally, the most complicated solution would be to special case on `f = identity` and in that case use `_fast` and explicitly check for the result being zero in the end, falling back to `min` if `f` is not `identity`.

None of the solutions keep the short circuiting, this I think is a good trade of in terms of correctness versus performance. But others might think differently.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Correctness issues with `minimum` and `maximum` #45932

Handling of signed zeros

Short circuiting for `NaN` and `missing`

Behaviour for other types

What to do now?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Correctness issues with minimum and maximum #45932

Description

Handling of signed zeros

Short circuiting for NaN and missing

Behaviour for other types

What to do now?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Correctness issues with `minimum` and `maximum` #45932

Short circuiting for `NaN` and `missing`