diff --git a/NEWS.md b/NEWS.md index a29fe2d43ab39..718bf2506101c 100644 --- a/NEWS.md +++ b/NEWS.md @@ -98,6 +98,11 @@ New library features Standard library changes ------------------------ +* The `init` keyword argument in `reduce(op, itr; init)` and other reduction functions with implementation-defined + associativity (`mapreduce`, `maximum`, `minimum`, `sum`, `prod`, `any`, and `all`) + is now guaranteed to be used for non-empty collections one or more times. This ensures that all calls + to the reducing function `op` have one argument that is either `init` or the result from a + previous `op` evaluation. Previously `init` was explicitly allowed to be omitted from the reduction entirely. * `gcdx(0, 0)` now returns `(0, 0, 0)` instead of `(0, 1, 0)` ([#40989]). #### StyledStrings diff --git a/base/reduce.jl b/base/reduce.jl index bbfd66e5686ed..63ac542e48ff8 100644 --- a/base/reduce.jl +++ b/base/reduce.jl @@ -272,29 +272,90 @@ mapreduce_impl(f, op, A::AbstractArrayOrBroadcasted, ifirst::Integer, ilast::Int """ mapreduce(f, op, itrs...; [init]) -Apply function `f` to each element(s) in `itrs`, and then reduce the result using the binary -function `op`. If provided, `init` must be a neutral element for `op` that will be returned -for empty collections. It is unspecified whether `init` is used for non-empty collections. -In general, it will be necessary to provide `init` to work with empty collections. +Apply function `f` to each element(s) in `itrs` (akin to [`map`](@ref)), +and then repeatedly call the 2 argument function `op` on those results, +`init`, or the result of a previous `op` evaluation until all elements +in the `itrs` have been included in the computation and a single value +is returned (akin to [`reduce`](@ref)). + +The optional `init` keyword argument must be an identity element for `op` as +it may be included in the reduction one or more times when provided. +The `init` value is not transformed by the function `f`. +Providing `init` ensures that `op` is never called with both its arguments coming from +the mapped `itrs`; it always combines the mapped element(s) with `init` or with the +results of a previous `op` call, or it combines the results of two previous `op` calls. + +The reduction function `op` should be associative but is not required to be commutative. +When the `itrs` are ordered collections, `reduce` preserves the ordering of its elements +in the arguments to `op` from left to right. In contrast to [`mapfoldl`](@ref) and +[`mapfoldr`](@ref), `reduce` does not enforce a particular associativity and some +implementations may reuse the return value of `f` for elements that appear multiple times in +`itrs`. See the extended help for more details. -[`mapreduce`](@ref) is functionally equivalent to calling -`reduce(op, map(f, itr); init=init)`, but will in general execute faster since no -intermediate collection needs to be created. See documentation for [`reduce`](@ref) and -[`map`](@ref). +For empty collections, `init` is the return value. It is generally an error to call `mapreduce` +with an empty collection without specifying an `init` value, but in some unambiguous cases a known +identity element for `op` may be returned. -!!! compat "Julia 1.2" - `mapreduce` with multiple iterators requires Julia 1.2 or later. +Some commonly-used operators may have special implementations of a mapped reduction, and +should be used instead: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`sum`](@ref)`(itr)`, +[`prod`](@ref)`(itr)`, [`any`](@ref)`(itr)`, [`all`](@ref)`(itr)`. # Examples ```jldoctest -julia> mapreduce(x->x^2, +, [1:3;]) # == 1 + 4 + 9 -14 +julia> mapreduce(√, +, [1, 4, 9, 16]) +10.0 + +julia> mapreduce(x->x/10, +, [1, 2, 3, 4]) ≈ 1.0 +true + +julia> mapreduce(uppercase, *, ['j','u','l','i','a']) +"JULIA" ``` -The associativity of the reduction is implementation-dependent. Additionally, some -implementations may reuse the return value of `f` for elements that appear multiple times in -`itr`. Use [`mapfoldl`](@ref) or [`mapfoldr`](@ref) instead for -guaranteed left or right associativity and invocation of `f` for every value. +# Extended Help + +## Arbitrary associativity: examples and consequences + +The associativity of the reduction is not specified, so the `op` function +should be associative and `init` (if provided) must be an identity element. +To demonstrate this, consider the example: + +```jldoctest +julia> mapreduce(√, +, [1, 4, 9, 16]; init=0.0) +10.0 +``` + +There are many possible ways in which `reduce` might compute this, +including: + * `(((0 + √1) + √4) + √9) + √16` (left-associative, like [`mapfoldl`](@ref)) + * `√1 + (√4 + (√9 + (√16 + 0)))` (right-associative, like [`mapfoldr`](@ref)) + * `((0 + √1) + (√4 + 0)) + ((0 + √9) + (√16 + 0))` (potentiall parallel) +The exact strategy does not matter; `reduce` returns `10.0` because these additions +are associative and `0` is a commutative identity element for `+`. Note how the `init` +value may be used one or more times with varying argument orderings and +is not transformed by `√`, but the ordering of the values in `[1,4,9,16]` is always maintained. + +In contrast, subtraction is not associative. Were `-` erroneously used instead +of `+` in the above example, `mapreduce` will return an arbitrary value as the +exact strategy is not defined. The three example strategies above yield +three different results (`-10.0`, `-2.0` and `4.0`, respectively). + +More subtly, floating point arithmetic is typically non-associative, +even for common operations like `+` that are associative in exact arithmetic. +This means that the magnitude of floating-point errors incurred by `mapreduce` are +also unspecified. For example, `mapreduce(x->x/10, +, [1, 2, 3, 4])` may return +either `1.0` or `0.9999999999999999`, and both are [`≈`](@ref Base.isapprox) to `1.0`: + +```jldoctest +julia> mapfoldl(x->x/10, +, [1, 2, 3, 4]) +1.0 + +julia> mapfoldr(x->x/10, +, [1, 2, 3, 4]) +0.9999999999999999 + +julia> mapreduce(x->x/10, +, [1, 2, 3, 4]) ≈ 1.0 +true +``` """ mapreduce(f, op, itr; kw...) = mapfoldl(f, op, itr; kw...) mapreduce(f, op, itr, itrs...; kw...) = reduce(op, Generator(f, itr, itrs...); kw...) @@ -444,36 +505,88 @@ _mapreduce(f, op, ::IndexCartesian, A::AbstractArrayOrBroadcasted) = mapfoldl(f, """ reduce(op, itr; [init]) -Reduce the given collection `itr` with the given binary operator `op`. If provided, the -initial value `init` must be a neutral element for `op` that will be returned for empty -collections. It is unspecified whether `init` is used for non-empty collections. +Repeatedly call the 2 argument function `op` on the element(s) in the `itr` +collection, `init`, or the result of a previous `op` evaluation until all elements +in `itr` have been included in the computation and a single value is returned. -For empty collections, providing `init` will be necessary, except for some special cases -(e.g. when `op` is one of `+`, `*`, `max`, `min`, `&`, `|`) when Julia can determine the -neutral element of `op`. +The optional `init` keyword argument must be an identity element for `op` as +it may be included in the reduction one or more times when provided. Providing +`init` ensures that `op` is never called with both its arguments coming from +`itr`; it always combines an element in `itr` with `init` or with the results +of a previous `op` call, or it combines the results of two previous `op` calls. -Reductions for certain commonly-used operators may have special implementations, and -should be used instead: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`sum`](@ref)`(itr)`, -[`prod`](@ref)`(itr)`, [`any`](@ref)`(itr)`, [`all`](@ref)`(itr)`. -There are efficient methods for concatenating certain arrays of arrays -by calling `reduce(`[`vcat`](@ref)`, arr)` or `reduce(`[`hcat`](@ref)`, arr)`. +The reduction function `op` should be associative but is not required to be commutative. +When `itr` is an ordered collection, `reduce` preserves the ordering of its elements +in the arguments to `op` from left to right. In contrast to [`foldl`](@ref) and +[`foldr`](@ref), `reduce` does not enforce a particular associativity. See the +extended help for more details. -The associativity of the reduction is implementation dependent. This means that you can't -use non-associative operations like `-` because it is undefined whether `reduce(-,[1,2,3])` -should be evaluated as `(1-2)-3` or `1-(2-3)`. Use [`foldl`](@ref) or -[`foldr`](@ref) instead for guaranteed left or right associativity. +For empty collections, `init` is the return value. It is generally an error to call `reduce` +with an empty collection without specifying an `init` value, but in some unambiguous cases a known +identity element for `op` may be returned. -Some operations accumulate error. Parallelism will be easier if the reduction can be -executed in groups. Future versions of Julia might change the algorithm. Note that the -elements are not reordered if you use an ordered collection. +Some commonly-used operators may have special implementations of a reduction, and +should be used instead: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`sum`](@ref)`(itr)`, +[`prod`](@ref)`(itr)`, [`any`](@ref)`(itr)`, [`all`](@ref)`(itr)`. # Examples ```jldoctest -julia> reduce(*, [2; 3; 4]) -24 +julia> reduce(+, [1, 2, 3, 4]; init=0) +10 -julia> reduce(*, Int[]; init=1) +julia> reduce(*, []; init=1) 1 + +julia> reduce(+, [.1, .2, .3, .4]) ≈ 1.0 +true + +julia> reduce(*, ['J','u','l','i','a']) +"Julia" +``` + +# Extended Help + +## Arbitrary associativity: examples and consequences + +The associativity of the reduction is not specified, so the `op` function +should be associative and `init` (if provided) must be an identity element. +To demonstrate this, consider the example: + +```jldoctest +julia> reduce(+, [1, 2, 3, 4]; init=0) +10 +``` + +There are many possible ways in which `reduce` might compute this, +including: + * `(((0 + 1) + 2) + 3) + 4` (left-associative, like [`foldl`](@ref)) + * `1 + (2 + (3 + (4 + 0)))` (right-associative, like [`foldr`](@ref)) + * `((0 + 1) + (2 + 0)) + ((0 + 3) + (4 + 0))` (potentially parallel) +The exact strategy does not matter; `reduce` returns `10` because integer `+` +is associative and `0` is its identity. Note how the `init` value is used +one or more times with varying argument orderings, but the ordering of the +values in `[1,2,3,4]` is always maintained. + +In contrast, integer subtraction is not associative. Were `-` erroneously +used instead of `+` above, `reduce` will return an arbitrary value as the +exact strategy is not defined. The three example strategies above yield +three different results (`-10`, `-2` and `4`, respectively). + +More subtly, floating point arithmetic is _also_ typically non-associative, +even for common operations like `+` that are associative in exact arithmetic. +This means that the magnitude of floating-point errors incurred by `reduce` are +also unspecified. For example, `reduce(+, [.1, .2, .3, .4])` may return +either `1.0` or `0.9999999999999999`, and both are [`≈`](@ref Base.isapprox) to `1.0`: + +```jldoctest +julia> foldl(+, [.1, .2, .3, .4]) +1.0 + +julia> foldr(+, [.1, .2, .3, .4]) +0.9999999999999999 + +julia> reduce(+, [.1, .2, .3, .4]) ≈ 1.0 +true ``` """ reduce(op, itr; kw...) = mapreduce(identity, op, itr; kw...) @@ -493,9 +606,13 @@ The return type is `Int` for signed integers of less than system word size, and `UInt` for unsigned integers of less than system word size. For all other arguments, a common return type is found to which all arguments are promoted. -The value returned for empty `itr` can be specified by `init`. It must be -the additive identity (i.e. zero) as it is unspecified whether `init` is used -for non-empty collections. +The optional `init` keyword argument must be an additive identity (i.e., zero), +and it provides the return value for empty collections. It is generally an error to call `sum` +with an empty collection without specifying an `init` value, but in some unambiguous cases the +[`zero`](@ref) of the return type may be returned. + +Like [`mapreduce`](@ref), the associativity of the additions is not specified and the return +value of `f` may be reused for repeated elements. !!! compat "Julia 1.6" Keyword argument `init` requires Julia 1.6 or later. @@ -505,21 +622,6 @@ for non-empty collections. julia> sum(abs2, [2; 3; 4]) 29 ``` - -Note the important difference between `sum(A)` and `reduce(+, A)` for arrays -with small integer eltype: - -```jldoctest -julia> sum(Int8[100, 28]) -128 - -julia> reduce(+, Int8[100, 28]) --128 -``` - -In the former case, the integers are widened to system word size and therefore -the result is 128. In the latter case, no such widening happens and integer -overflow results in -128. """ sum(f, a; kw...) = mapreduce(f, add_sum, a; kw...) @@ -532,9 +634,12 @@ The return type is `Int` for signed integers of less than system word size, and `UInt` for unsigned integers of less than system word size. For all other arguments, a common return type is found to which all arguments are promoted. -The value returned for empty `itr` can be specified by `init`. It must be -the additive identity (i.e. zero) as it is unspecified whether `init` is used -for non-empty collections. +The optional `init` keyword argument must be an additive identity (i.e., zero), +and it provides the return value for empty collections. It is generally an error to call `sum` +with an empty collection without specifying an `init` value, but in unambiguous cases the +[`zero`](@ref) of the return type may be returned. + +Like [`reduce`](@ref), the associativity of the additions is not specified. !!! compat "Julia 1.6" Keyword argument `init` requires Julia 1.6 or later. @@ -549,6 +654,21 @@ julia> sum(1:20) julia> sum(1:20; init = 0.0) 210.0 ``` + +Note the important difference between `sum(A)` and `reduce(+, A)` for arrays +with small integer eltype: + +```jldoctest +julia> sum(Int8[100, 28]) +128 + +julia> reduce(+, Int8[100, 28]) +-128 +``` + +In the former case, the integers are widened to system word size and therefore +the result is 128. In the latter case, no such widening happens and integer +overflow results in -128. """ sum(a; kw...) = sum(identity, a; kw...) sum(a::AbstractArray{Bool}; kw...) = @@ -564,9 +684,13 @@ The return type is `Int` for signed integers of less than system word size, and `UInt` for unsigned integers of less than system word size. For all other arguments, a common return type is found to which all arguments are promoted. -The value returned for empty `itr` can be specified by `init`. It must be the -multiplicative identity (i.e. one) as it is unspecified whether `init` is used -for non-empty collections. +The optional `init` keyword argument must be a multiplicative identity (i.e., one), +and it provides the return value for empty collections. It is generally an error to call `prod` +with an empty collection without specifying an `init` value, but in unambiguous cases the +[`one`](@ref) of the return type may be returned. + +Like [`mapreduce`](@ref), the associativity of the multiplications is not specified and the return +value of `f` may be reused for repeated elements. !!! compat "Julia 1.6" Keyword argument `init` requires Julia 1.6 or later. @@ -588,9 +712,12 @@ The return type is `Int` for signed integers of less than system word size, and `UInt` for unsigned integers of less than system word size. For all other arguments, a common return type is found to which all arguments are promoted. -The value returned for empty `itr` can be specified by `init`. It must be the -multiplicative identity (i.e. one) as it is unspecified whether `init` is used -for non-empty collections. +The optional `init` keyword argument must be a multiplicative identity (i.e., one), +and it provides the return value for empty collections. It is generally an error to call `prod` +with an empty collection without specifying an `init` value, but in unambiguous cases the +[`one`](@ref) of the return type may be returned. + +Like [`reduce`](@ref), the associativity of the multiplications is not specified. !!! compat "Julia 1.6" Keyword argument `init` requires Julia 1.6 or later. @@ -671,10 +798,13 @@ end Return the largest result of calling function `f` on each element of `itr`. -The value returned for empty `itr` can be specified by `init`. It must be -a neutral element for `max` (i.e. which is less than or equal to any -other element) as it is unspecified whether `init` is used -for non-empty collections. +The optional `init` keyword argument must be an identity for `max` (i.e., less than or +equal to all values), and it provides the return value for empty collections. It is +an error to call `maximum` with an empty collection without specifying an `init` value. + +Like [`mapreduce`](@ref), the associativity of the `max` operations is not specified and +the return value of `f` may be reused for repeated elements. +Use [`mapfoldl(f, max, itr)`](@ref) for stronger guarantees on the evaluation strategy. !!! compat "Julia 1.6" Keyword argument `init` requires Julia 1.6 or later. @@ -698,10 +828,13 @@ maximum(f, a; kw...) = mapreduce(f, max, a; kw...) Return the smallest result of calling function `f` on each element of `itr`. -The value returned for empty `itr` can be specified by `init`. It must be -a neutral element for `min` (i.e. which is greater than or equal to any -other element) as it is unspecified whether `init` is used -for non-empty collections. +The optional `init` keyword argument must be an identity for `min` (i.e., less than or +equal to all values), and it provides the return value for empty collections. It is +an error to call `minimum` with an empty collection without specifying an `init` value. + +Like [`mapreduce`](@ref), the associativity of the `min` operations is not specified and +the return value of `f` may be reused for repeated elements. +Use [`mapfoldl(f, min, itr)`](@ref) for stronger guarantees on the evaluation strategy. !!! compat "Julia 1.6" Keyword argument `init` requires Julia 1.6 or later. @@ -725,10 +858,12 @@ minimum(f, a; kw...) = mapreduce(f, min, a; kw...) Return the largest element in a collection. -The value returned for empty `itr` can be specified by `init`. It must be -a neutral element for `max` (i.e. which is less than or equal to any -other element) as it is unspecified whether `init` is used -for non-empty collections. +The optional `init` keyword argument must be an identity for `max` (i.e., less than or +equal to all values), and it provides the return value for empty collections. It is +an error to call `maximum` with an empty collection without specifying an `init` value. + +Like [`reduce`](@ref), the associativity of the `max` operations is not specified. +Use [`foldl(max, itr)`](@ref) for stronger guarantees on the evaluation strategy. !!! compat "Julia 1.6" Keyword argument `init` requires Julia 1.6 or later. @@ -757,10 +892,12 @@ maximum(a; kw...) = mapreduce(identity, max, a; kw...) Return the smallest element in a collection. -The value returned for empty `itr` can be specified by `init`. It must be -a neutral element for `min` (i.e. which is greater than or equal to any -other element) as it is unspecified whether `init` is used -for non-empty collections. +The optional `init` keyword argument must be an identity for `min` (i.e., less than or +equal to all values), and it provides the return value for empty collections. It is +an error to call `minimum` with an empty collection without specifying an `init` value. + +Like [`reduce`](@ref), the associativity of the `min` operations is not specified. +Use [`foldl(min, itr)`](@ref) for stronger guarantees on the evaluation strategy. !!! compat "Julia 1.6" Keyword argument `init` requires Julia 1.6 or later. @@ -787,14 +924,15 @@ minimum(a; kw...) = mapreduce(identity, min, a; kw...) """ extrema(itr; [init]) -> (mn, mx) -Compute both the minimum `mn` and maximum `mx` element in a single pass, and return them +Compute both the [`minimum`](@ref) `mn` and [`maximum`](@ref) `mx` element in an efficient manner, and return them as a 2-tuple. -The value returned for empty `itr` can be specified by `init`. It must be a 2-tuple whose -first and second elements are neutral elements for `min` and `max` respectively -(i.e. which are greater/less than or equal to any other element). As a consequence, when -`itr` is empty the returned `(mn, mx)` tuple will satisfy `mn ≥ mx`. When `init` is -specified it may be used even for non-empty `itr`. +The optional `init` keyword argument must be a 2-tuple whose elements are identities +for `min` and `max`, respectively (i.e. which are greater/less than or equal to any +other element), and it provides the return value for empty collections. As a consequence, +the initial minimum must be greater than or equal to the initial maximum, and when +`itr` is empty the returned `(mn, mx)` tuple will have `mn ≥ mx`. It is +an error to call `extrema` with an empty collection without specifying an `init` value. !!! compat "Julia 1.8" Keyword argument `init` requires Julia 1.8 or later. @@ -816,15 +954,15 @@ extrema(itr; kw...) = extrema(identity, itr; kw...) """ extrema(f, itr; [init]) -> (mn, mx) -Compute both the minimum `mn` and maximum `mx` of `f` applied to each element in `itr` and -return them as a 2-tuple. Only one pass is made over `itr`. +Compute both the [`minimum`](@ref) `mn` and [`maximum`](@ref) `mx` of `f` applied to each element in `itr` and +return them as a 2-tuple in an efficient manner. -The value returned for empty `itr` can be specified by `init`. It must be a 2-tuple whose -first and second elements are neutral elements for `min` and `max` respectively -(i.e. which are greater/less than or equal to any other element). It is used for non-empty -collections. Note: it implies that, for empty `itr`, the returned value `(mn, mx)` satisfies -`mn ≥ mx` even though for non-empty `itr` it satisfies `mn ≤ mx`. This is a "paradoxical" -but yet expected result. +The optional `init` keyword argument must be a 2-tuple whose elements are identities +for `min` and `max`, respectively (i.e. which are greater/less than or equal to any +other returned value), and it provides the return value for empty collections. As a consequence, +the initial minimum must be greater than or equal to the initial maximum, and when +`itr` is empty the returned `(mn, mx)` tuple will have `mn ≥ mx`. It is +an error to call `extrema` with an empty collection without specifying an `init` value. !!! compat "Julia 1.2" This method requires Julia 1.2 or later.