From f36ffbb14e7c243dfbc1d54cc6efc922508f9b3f Mon Sep 17 00:00:00 2001 From: Matt Bauman Date: Tue, 26 Mar 2024 18:30:21 -0400 Subject: [PATCH 1/6] Add more clarity to `init`'s role in reduce and mapreduce --- base/reduce.jl | 54 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 22 deletions(-) diff --git a/base/reduce.jl b/base/reduce.jl index 6a0d46c61fcd9..1d85e621a619d 100644 --- a/base/reduce.jl +++ b/base/reduce.jl @@ -281,15 +281,29 @@ mapreduce_impl(f, op, A::AbstractArrayOrBroadcasted, ifirst::Integer, ilast::Int mapreduce(f, op, itrs...; [init]) Apply function `f` to each element(s) in `itrs`, and then reduce the result using the binary -function `op`. If provided, `init` must be a neutral element for `op` that will be returned -for empty collections. It is unspecified whether `init` is used for non-empty collections. -In general, it will be necessary to provide `init` to work with empty collections. +function `op`. + +The order of function evaluations and the associativity of the reduction is unspecified. Some +implementations may reuse the return value of `f` for elements that appear multiple times in +`itr`. Use [`mapfoldl`](@ref) or [`mapfoldr`](@ref) for +strict left or right associativity and guaranteed invocation of `f` for every value. + +If provided, `init` serves as the return value for empty `itrs`. For non-empty iterators, +it is included in the reduction exactly once and ensures that every element in `itrs` is +used as an argument to `op`. Like the reduction itself, the exact order and associativity +of how `init` is included is not specified. It is generally an error to call `mapreduce` +with empty collections without specifying an `init` value, but in unambiguous cases the +identity value may be returned; see [`Base.reduce_empty`](@ref) for more details. [`mapreduce`](@ref) is functionally equivalent to calling -`reduce(op, map(f, itr); init=init)`, but will in general execute faster since no +`reduce(op, map(f, itrs...)...; init=init)`, but will in general execute faster since no intermediate collection needs to be created. See documentation for [`reduce`](@ref) and [`map`](@ref). +Some commonly-used operators may have special implementations of a mapped reduction, and +should be used instead: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`sum`](@ref)`(itr)`, +[`prod`](@ref)`(itr)`, [`any`](@ref)`(itr)`, [`all`](@ref)`(itr)`. + !!! compat "Julia 1.2" `mapreduce` with multiple iterators requires Julia 1.2 or later. @@ -298,11 +312,6 @@ intermediate collection needs to be created. See documentation for [`reduce`](@r julia> mapreduce(x->x^2, +, [1:3;]) # == 1 + 4 + 9 14 ``` - -The associativity of the reduction is implementation-dependent. Additionally, some -implementations may reuse the return value of `f` for elements that appear multiple times in -`itr`. Use [`mapfoldl`](@ref) or [`mapfoldr`](@ref) instead for -guaranteed left or right associativity and invocation of `f` for every value. """ mapreduce(f, op, itr; kw...) = mapfoldl(f, op, itr; kw...) mapreduce(f, op, itrs...; kw...) = reduce(op, Generator(f, itrs...); kw...) @@ -452,13 +461,20 @@ _mapreduce(f, op, ::IndexCartesian, A::AbstractArrayOrBroadcasted) = mapfoldl(f, """ reduce(op, itr; [init]) -Reduce the given collection `itr` with the given binary operator `op`. If provided, the -initial value `init` must be a neutral element for `op` that will be returned for empty -collections. It is unspecified whether `init` is used for non-empty collections. +Reduce the given collection `itr` with the given binary operator `op`. + +The order of evaluations and the associativity of the reduction is unspecified. +This means that you shouldn't +use non-associative operations like `-` because it is undefined whether `reduce(-,[1,2,3])` +will be evaluated as `(1-2)-3` or `1-(2-3)`. Use +[`foldl`](@ref) or [`foldr`](@ref) for guaranteed left or right associativity. -For empty collections, providing `init` will be necessary, except for some special cases -(e.g. when `op` is one of `+`, `*`, `max`, `min`, `&`, `|`) when Julia can determine the -neutral element of `op`. +If provided, `init` serves as the return value for empty `itrs`. For non-empty iterators, +it is included in the reduction exactly once and ensures that every element in `itrs` is +used as an argument to `op`. Like the reduction itself, the exact order and associativity +of how `init` is included is not specified. It is generally an error to call `mapreduce` +with empty collections without specifying an `init` value, but in unambiguous cases the +identity value may be returned; see [`Base.reduce_empty`](@ref) for more details. Reductions for certain commonly-used operators may have special implementations, and should be used instead: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`sum`](@ref)`(itr)`, @@ -466,14 +482,8 @@ should be used instead: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`su There are efficient methods for concatenating certain arrays of arrays by calling `reduce(`[`vcat`](@ref)`, arr)` or `reduce(`[`hcat`](@ref)`, arr)`. -The associativity of the reduction is implementation dependent. This means that you can't -use non-associative operations like `-` because it is undefined whether `reduce(-,[1,2,3])` -should be evaluated as `(1-2)-3` or `1-(2-3)`. Use [`foldl`](@ref) or -[`foldr`](@ref) instead for guaranteed left or right associativity. - Some operations accumulate error. Parallelism will be easier if the reduction can be -executed in groups. Future versions of Julia might change the algorithm. Note that the -elements are not reordered if you use an ordered collection. +executed in groups. Future versions of Julia might change the algorithm. # Examples ```jldoctest From 32e762b13bf9d263e67c31ed65e551c01c0e4397 Mon Sep 17 00:00:00 2001 From: Matt Bauman Date: Wed, 27 Mar 2024 11:35:47 -0400 Subject: [PATCH 2/6] fix too many dots...... Co-authored-by: mikmoore <95002244+mikmoore@users.noreply.github.com> --- base/reduce.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/base/reduce.jl b/base/reduce.jl index 1d85e621a619d..9e6d1c0b443ed 100644 --- a/base/reduce.jl +++ b/base/reduce.jl @@ -296,7 +296,7 @@ with empty collections without specifying an `init` value, but in unambiguous ca identity value may be returned; see [`Base.reduce_empty`](@ref) for more details. [`mapreduce`](@ref) is functionally equivalent to calling -`reduce(op, map(f, itrs...)...; init=init)`, but will in general execute faster since no +`reduce(op, map(f, itrs...); init=init)`, but will in general execute faster since no intermediate collection needs to be created. See documentation for [`reduce`](@ref) and [`map`](@ref). From 549ceac67e4794d2c2631ddb95141d1135d70707 Mon Sep 17 00:00:00 2001 From: Matt Bauman Date: Wed, 27 Mar 2024 17:40:20 -0400 Subject: [PATCH 3/6] Incorporate feedback; define left-most position for init --- base/reduce.jl | 116 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 75 insertions(+), 41 deletions(-) diff --git a/base/reduce.jl b/base/reduce.jl index 9e6d1c0b443ed..3e997ad2b7fcc 100644 --- a/base/reduce.jl +++ b/base/reduce.jl @@ -280,20 +280,33 @@ mapreduce_impl(f, op, A::AbstractArrayOrBroadcasted, ifirst::Integer, ilast::Int """ mapreduce(f, op, itrs...; [init]) -Apply function `f` to each element(s) in `itrs`, and then reduce the result using the binary -function `op`. - -The order of function evaluations and the associativity of the reduction is unspecified. Some -implementations may reuse the return value of `f` for elements that appear multiple times in -`itr`. Use [`mapfoldl`](@ref) or [`mapfoldr`](@ref) for -strict left or right associativity and guaranteed invocation of `f` for every value. - -If provided, `init` serves as the return value for empty `itrs`. For non-empty iterators, -it is included in the reduction exactly once and ensures that every element in `itrs` is -used as an argument to `op`. Like the reduction itself, the exact order and associativity -of how `init` is included is not specified. It is generally an error to call `mapreduce` -with empty collections without specifying an `init` value, but in unambiguous cases the -identity value may be returned; see [`Base.reduce_empty`](@ref) for more details. +Apply function `f` to each element(s) in `itrs`, and then repeatedly call the 2 argument +function `op` with those results or results from previous `op` evaluations until a single value is returned. + +If provided, `init` is included exactly once as the left-most argument to `op` +for non-empty `itrs` and serves as the return value for empty `itrs`. It is +not transformed by the mapping function `f`. It is generally an error to call `mapreduce` +with empty collections without specifying an `init` value, but in unambiguous cases an +identity value for `op` may be returned; see [`Base.reduce_empty`](@ref) for more details. + +In contrast with [`mapfoldl`](@ref) and [`mapfoldr`](@ref), the sequence of +function evaluations and the associativity of the reduction is not specified +and may vary between different methods and Julia versions. +For example, `mapreduce(√, +, [1, 4, 9])` may be evaluated as either +`(√1+√4)+√9` (left-associative) _or_ `√1+(√4+√9)` (right-associative). +The return value for non-associative `op` functions may vary between +different methods and between Julia versions. For example, `-` is not +associative and thus `mapreduce(√, -, [1, 4, 9])` may return either +`-4.0` or `2.0` depending upon the exact method or version of Julia. +This is also true of some floating point operations that are typically +associative, for example `mapreduce(identity, +, [.1, .2, .3])` may return +either `0.6` or `0.6000000000000001`. + +While the associativity of the reduction is not defined, `mapreduce` does preserve +the ordering of the iterator for ordered collections. For example, +`mapreduce(uppercase, string, ['j','u','l','i','a'])` is guaranteed to always +return the properly-spelled `"JULIA"` because `Array`s are ordered collections; +the returned ordering is not guaranteed with an unordered collection like `Set`. [`mapreduce`](@ref) is functionally equivalent to calling `reduce(op, map(f, itrs...); init=init)`, but will in general execute faster since no @@ -309,8 +322,17 @@ should be used instead: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`su # Examples ```jldoctest -julia> mapreduce(x->x^2, +, [1:3;]) # == 1 + 4 + 9 -14 +julia> mapreduce(√, +, [1, 4, 9]) +6.0 + +julia> mapreduce(identity, +, [.1, .2, .3]) ≈ 0.6 +true + +julia> mapreduce(uppercase, string, ['j','u','l','i','a']) +"JULIA" + +julia> mapreduce(uppercase, string, ['j','u','l','i','a'], init="Hello ") +"Hello JULIA" ``` """ mapreduce(f, op, itr; kw...) = mapfoldl(f, op, itr; kw...) @@ -461,37 +483,49 @@ _mapreduce(f, op, ::IndexCartesian, A::AbstractArrayOrBroadcasted) = mapfoldl(f, """ reduce(op, itr; [init]) -Reduce the given collection `itr` with the given binary operator `op`. - -The order of evaluations and the associativity of the reduction is unspecified. -This means that you shouldn't -use non-associative operations like `-` because it is undefined whether `reduce(-,[1,2,3])` -will be evaluated as `(1-2)-3` or `1-(2-3)`. Use -[`foldl`](@ref) or [`foldr`](@ref) for guaranteed left or right associativity. - -If provided, `init` serves as the return value for empty `itrs`. For non-empty iterators, -it is included in the reduction exactly once and ensures that every element in `itrs` is -used as an argument to `op`. Like the reduction itself, the exact order and associativity -of how `init` is included is not specified. It is generally an error to call `mapreduce` -with empty collections without specifying an `init` value, but in unambiguous cases the -identity value may be returned; see [`Base.reduce_empty`](@ref) for more details. - -Reductions for certain commonly-used operators may have special implementations, and +Repeatedly call the 2 argument function `op` with the element(s) in `itr` +or results from previous `op` evaluations until a single value is returned. + +If provided, `init` is included exactly once as the left-most argument to `op` +for non-empty `itrs` and serves as the return value for empty `itrs`. It is generally an error to call `reduce` +with empty collections without specifying an `init` value, but in unambiguous cases an +identity value for `op` may be returned; see [`Base.reduce_empty`](@ref) for more details. + +In contrast with [`foldl`](@ref) and [`foldr`](@ref), the associativity of the reduction is not specified +and may vary between different methods and Julia versions. +For example, `reduce(+, [1, 2, 3])` may be evaluated as either +`(1+2)+3` (left-associative) _or_ `1+(2+3)` (right-associative). +The return value for non-associative `op` functions may vary between +different methods and between Julia versions. For example, `-` is not +associative and thus `reduce(-, [1, 2, 3])` may return either +`-4` or `2` depending upon the exact method or version of Julia. +This is also true of some floating point operations that are typically +associative, for example `reduce(+, [.1, .2, .3])` may return +either `0.6` or `0.6000000000000001`. + +While the associativity of the reduction is not defined, `reduce` does preserve +the ordering of the iterator for ordered collections. For example, +`reduce(string, ['J','u','l','i','a'])` is guaranteed to always +return the properly-spelled `"Julia"` because `Array`s are ordered collections; +the returned ordering is not guaranteed with an unordered collection like `Set`. + +Some commonly-used operators may have special implementations of a reduction, and should be used instead: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`sum`](@ref)`(itr)`, [`prod`](@ref)`(itr)`, [`any`](@ref)`(itr)`, [`all`](@ref)`(itr)`. -There are efficient methods for concatenating certain arrays of arrays -by calling `reduce(`[`vcat`](@ref)`, arr)` or `reduce(`[`hcat`](@ref)`, arr)`. - -Some operations accumulate error. Parallelism will be easier if the reduction can be -executed in groups. Future versions of Julia might change the algorithm. # Examples ```jldoctest -julia> reduce(*, [2; 3; 4]) -24 +julia> reduce(+, [1, 2, 3]) +6 + +julia> reduce(+, [.1, .2, .3]) ≈ 0.6 +true + +julia> reduce(string, ['J','u','l','i','a']) +"Julia" -julia> reduce(*, [2; 3; 4]; init=-1) --24 +julia> reduce(string, ['J','u','l','i','a'], init="Hello ") +"Hello Julia" ``` """ reduce(op, itr; kw...) = mapreduce(identity, op, itr; kw...) From 19b76d4b77dafbd2b1fd35c9ecf1808fbd21f4b0 Mon Sep 17 00:00:00 2001 From: Matt Bauman Date: Thu, 28 Mar 2024 14:30:50 -0400 Subject: [PATCH 4/6] Apply suggestions from code review Co-authored-by: Steven G. Johnson --- base/reduce.jl | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/base/reduce.jl b/base/reduce.jl index 3e997ad2b7fcc..f44280af14a48 100644 --- a/base/reduce.jl +++ b/base/reduce.jl @@ -298,15 +298,17 @@ The return value for non-associative `op` functions may vary between different methods and between Julia versions. For example, `-` is not associative and thus `mapreduce(√, -, [1, 4, 9])` may return either `-4.0` or `2.0` depending upon the exact method or version of Julia. -This is also true of some floating point operations that are typically -associative, for example `mapreduce(identity, +, [.1, .2, .3])` may return +Because floating-point roundoff errors typically break associativity, +even for operations like + that are associative in exact arithmetic, +this also means that the floating-point errors incurred by mapreduce +are implementation-defined; for example `mapreduce(identity, +, [.1, .2, .3])` may return either `0.6` or `0.6000000000000001`. While the associativity of the reduction is not defined, `mapreduce` does preserve -the ordering of the iterator for ordered collections. For example, -`mapreduce(uppercase, string, ['j','u','l','i','a'])` is guaranteed to always +the ordering of the iterator for ordered collections, so that the result does *not* require `op` to be commutative. For example, +`mapreduce(uppercase, *, ['j','u','l','i','a'])` is guaranteed to always return the properly-spelled `"JULIA"` because `Array`s are ordered collections; -the returned ordering is not guaranteed with an unordered collection like `Set`. +in contrast, the operand ordering is not guaranteed with an unordered collection like `Set`. [`mapreduce`](@ref) is functionally equivalent to calling `reduce(op, map(f, itrs...); init=init)`, but will in general execute faster since no @@ -314,7 +316,7 @@ intermediate collection needs to be created. See documentation for [`reduce`](@r [`map`](@ref). Some commonly-used operators may have special implementations of a mapped reduction, and -should be used instead: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`sum`](@ref)`(itr)`, +are recommended instead of `mapreduce`: [`maximum`](@ref)`(itr)`, [`minimum`](@ref)`(itr)`, [`sum`](@ref)`(itr)`, [`prod`](@ref)`(itr)`, [`any`](@ref)`(itr)`, [`all`](@ref)`(itr)`. !!! compat "Julia 1.2" @@ -328,10 +330,10 @@ julia> mapreduce(√, +, [1, 4, 9]) julia> mapreduce(identity, +, [.1, .2, .3]) ≈ 0.6 true -julia> mapreduce(uppercase, string, ['j','u','l','i','a']) +julia> mapreduce(uppercase, *, ['j','u','l','i','a']) "JULIA" -julia> mapreduce(uppercase, string, ['j','u','l','i','a'], init="Hello ") +julia> mapreduce(uppercase, *, ['j','u','l','i','a'], init="Hello ") "Hello JULIA" ``` """ From 793a4e0c73012c77c42131aec1fd27ca837fb0a7 Mon Sep 17 00:00:00 2001 From: Matt Bauman Date: Fri, 29 Mar 2024 11:50:25 -0400 Subject: [PATCH 5/6] Add NEWS.md --- NEWS.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/NEWS.md b/NEWS.md index 72b4629fe4174..14968586df914 100644 --- a/NEWS.md +++ b/NEWS.md @@ -77,6 +77,14 @@ New library features Standard library changes ------------------------ +* The `init` keyword for `reduce` and other reduction functions without guaranteed + associativity (`mapreduce`, `maximum`, `minimum`, `sum`, `prod`, `any`, and `all`) + now provides greater gaurantees on how its value is incorporated into the reduction: + it is used exactly once as the left-most argument for all non-empty collections, + and it is no longer required to be a "neutral" operand for the reduction. + Previously, its semantics for non-empty collections was explicitly not specified, allowing + implementations to use it 0, 1, or more times in the reduction ([#53871]). + #### StyledStrings #### JuliaSyntaxHighlighting From 7859c7ab9aa4ebf8725f902e8ce68f316718763c Mon Sep 17 00:00:00 2001 From: Matt Bauman Date: Fri, 29 Mar 2024 11:51:58 -0400 Subject: [PATCH 6/6] typo fix --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index 14968586df914..5dbd8ee20c95d 100644 --- a/NEWS.md +++ b/NEWS.md @@ -79,7 +79,7 @@ Standard library changes * The `init` keyword for `reduce` and other reduction functions without guaranteed associativity (`mapreduce`, `maximum`, `minimum`, `sum`, `prod`, `any`, and `all`) - now provides greater gaurantees on how its value is incorporated into the reduction: + now provides greater guarantees on how its value is incorporated into the reduction: it is used exactly once as the left-most argument for all non-empty collections, and it is no longer required to be a "neutral" operand for the reduction. Previously, its semantics for non-empty collections was explicitly not specified, allowing