Skip to content

fix map! undefined value exposure #56673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 19, 2025

Conversation

adienes
Copy link
Member

@adienes adienes commented Nov 24, 2024

hopefully more targeted / less disruptive than #50647 , would (at least partially) fix #36235

may need extra documentation explaining that with this implementation, @inbounds map!(dest, As) will always iterate through the indices of As[1] regardless of the other sizes

tbh, the existing implementation is pretty fundamentally incorrect and does not match the behavior explicitly described in the docstring, so I think that this PR (or #50647 , or something else similar) should merge regardless of benchmarks impact. however, I would appreciate if someone could trigger a benchmark run just so we can see the impact :)

@adienes
Copy link
Member Author

adienes commented Nov 24, 2024

pretty sure build failures are unrelated

@LilithHafner
Copy link
Member

LilithHafner commented Nov 27, 2024

This looks like a simple typo:

@boundscheck LinearIndices(dest) == idxs1 && all(x -> LinearIndices(x) == idxs1, As)

That's not how @boundscheck works. That won't throw even if the indices don't line up. Would it work to simply fix that typo or would that be too strict because the typo has been present for six years?

@adienes
Copy link
Member Author

adienes commented Nov 27, 2024

typo aside, it's checking the wrong thing. that's checking that literally all the indices are equal.

map is supposed to stop when any of the constituent arguments are exhausted --- this should be honored even in the presence of @inbounds ! I only made it skip this check in the presence of @inbounds in the PR because I figured there would be complaints if the added indices intersection impacted performance

but for a correct implementation to match the docstring, I think intersect(map(eachindex ∘ LinearIndices, As)...) --- or something else to that effect must be computed, @inbounds or no

this is one of those cases where Julia skipped directly to "make it fast" before "make it right" ☹️

@LilithHafner LilithHafner added the bugfix This change fixes an existing bug label Nov 27, 2024
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this significant issue. Much appreciated. A related issue that you might want to be aware of, but don't need to fix in this PR is #56696.

I agree that it is acceptable to take some performance regressions in exchange for fixing this bug. I have some refactoring suggestions to make it a bit simpler.

I'm planning to run benchmarks once we agree on an implementation, but if you want me to run them right away instead I'd be happy to do that.

Comment on lines 3465 to 3470
idxs1 = eachindex(LinearIndices(As[1]))
@boundscheck begin
idxs1 = intersect(map(eachindex ∘ LinearIndices, As)...)
checkbounds(dest, idxs1)
end
for i = idxs1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
idxs1 = eachindex(LinearIndices(As[1]))
@boundscheck begin
idxs1 = intersect(map(eachindex LinearIndices, As)...)
checkbounds(dest, idxs1)
end
for i = idxs1
idxs = mapreduce(LinearIndices, intersect, As)
checkbounds(dest, inds)
for i = idxs
  • I don't see why we need eachindex is here
  • We should only include bounds-check elision if benchmarks reveal that it makes a difference. In this case, we would also need to add @propagate_inbounds to map! to make it take effect.
  • I think that mapreduce(LinearIndices, intersect, As) is more elegant than intersect(map(LinearIndices, As)...), unless there is some performance reason to use the latter.
  • If always compute the intersection of indices then inds is a better name than inds1

Copy link
Member Author

@adienes adienes Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the comments!

the eachindex I added for the following reason

julia> mapreduce(LinearIndices, intersect, [A, B])
3-element Vector{Int64}:
 1
 2
 3

julia> mapreduce(eachindex ∘ LinearIndices, intersect, [A, B])
Base.OneTo(3)

so I figured that if we can hit fast-path intersection for ranges rather than collecting the indices we should. Another solution might be to add a fast intersect(::LinearIndices, ::LinearIndices) ?

the mapreduce formulation indeed looks nicer to me as well.

  • If always compute the intersection of indices then inds is a better name than inds1

Agreed! but I suppose we should decide if in the first place @inbounds should allow skipping the check and iterate through the indices of the first argument? I think this is a decision on semantics rather than implementation that I am not empowered to make. the docs should probably be updated if this is intended

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the eachindex I added for the following reason

Good reason.

I suppose we should decide if in the first place @inbounds should allow skipping the check and iterate through the indices of the first argument?

@inbounds should never expose documented semantics that are otherwise not present. This is because you can't count on bounds checks being removed. For exmaple, Julia could be run with -check-bound=yes or the boundschecking might not be inlined into the callsite where @inbounds is present. In either case, the bounds checks will remain. Consequently I think the answer to that question is no: @inbounds should not change the semantics other than to skip bounds checks.

@@ -910,6 +910,9 @@ include("generic_map_tests.jl")
generic_map_tests(map, map!)
@test_throws ArgumentError map!(-, [1])

# Issue #30624
@test map!(+, [0,0,0], [1,2], [10,20,30], [100]) == [111,0,0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you also add a test for #36235 (comment), please?

@adienes
Copy link
Member Author

adienes commented Nov 28, 2024

  • added a bunch more tests (many failing or segfaulting on main)
  • removed all @inbounds until it's clear that these are needed for performance, we can re-add
  • I would like to abdicate responsibility for fixing OffsetArrays interactions :)

@LilithHafner LilithHafner added minor change Marginal behavior change acceptable for a minor release needs pkgeval Tests for all registered packages should be run with this change labels Dec 2, 2024
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, map!(+, ones(1), ones(2, 2)) and map!(+, ones(1), ones(2, 2), ones(2, 2)) worked and this makes them throw (as the docstring indicates they should).

Performance is, at a glance, sometimes slightly better and sometimes slightly worse than it was on master. Correctness first, performance optimization second so I'm not too concerned.

I like that the implementation for map_n! is a bit simpler now and that it clearly aligns with the implementations of 1-arg and 2-arg map.

@LilithHafner
Copy link
Member

  • I would like to abdicate responsibility for fixing OffsetArrays interactions :)

Unfortunately we have to deal with them before merging because otherwise existing, valid, code that uses OffsetArrays will break. For example,

julia> R = fill(0, -4:-1);

julia> B = OffsetArray(reshape(1:12, 4, 3), -5, 6);

julia> minimum!(R, B)
4-element OffsetArray(::Vector{Int64}, -4:-1) with eltype Int64 with indices -4:-1:
 1
 2
 3
 4

This is a test failure from CI which fails because the implementation of minimum! (which lives in Base) for arrays with unconventional indexing relies on a bug that this PR fixes:

julia> x = fill(0, 1:3);

julia> y = fill(1, 2:4);

julia> map!(identity, x, y)
3-element OffsetArray(::Vector{Int64}, 1:3) with eltype Int64 with indices 1:3:
 1
 1
 1

@nanosoldier runtests()

@adienes
Copy link
Member Author

adienes commented Dec 2, 2024

I guess we'll have to iterate through destination idxs separately to fix the OffsetArrays issue? then dest will be treated just as a fully generic iterable / assignable collection.

function map!(f::F, dest::AbstractArray, A::AbstractArray, B::AbstractArray) where F
    inds = intersect(eachindex(LinearIndices(A)), eachindex(LinearIndices(B)))
    @boundscheck length(dest) >= length(inds)
    dest_idx = firstindex(dest)
    for i = inds
        dest[dest_idx] = f(A[i], B[i])
        dest_idx = nextind(dest, dest_idx)
    end
    return dest
end

I could be wrong but this seems more likely to hurt performance.

Would there be a meaningful difference here between iterating through firstindex, nextindex of dest directly vs through LinearIndices(dest) ?

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@LilithHafner
Copy link
Member

Oh, wow! Nanosoldier looks clean. Rerunning failures to be sure:

@nanosoldier runtests(["ApproxLogFunction", "TestSetExtensions", "FunctionOperators", "RemoteFiles", "ChebyshevFiltering", "EinExprs", "Andes", "OpenTelemetryExporterPrometheus", "RegressionTables", "GLPK", "CDDLib", "OrdinaryDiffEqSSPRK", "MPIMeasurements", "ManifoldDiffEq", "OceanRobots", "DynamicMovementPrimitives", "ActiveInference", "Yunir", "ImmersedLayers", "FrequencySweep", "LowRankIntegrators", "NetworkJumpProcesses", "BoxCox"])

@LilithHafner
Copy link
Member

I guess we'll have to iterate through destination idxs separately to fix the OffsetArrays issue?

Given how nice PkgEval is looking, another option would be to fix the minimum! implementation to not rely on the map! bug.

Would there be a meaningful difference here between iterating through firstindex, nextindex of dest directly vs through LinearIndices(dest) ?

I doubt it, but hopefully that won't be necessary. If we can implement it in a way that isn't practically breaking, I'd love to make map! take offset indices into account (like map) rather than iterating separately.

@adienes
Copy link
Member Author

adienes commented Dec 4, 2024

I'm finding all the indirection in reducedim.jl pretty inscrutable 😓

but I guess the "real" issue is in mapfirst! ?

function mapfirst!(f::F, R::AbstractArray, A::AbstractArray{<:Any,N}) where {N, F}
    lsiz = check_reducedims(R, A)
    t = _firstreducedslice(axes(R), axes(A))
    map!(f, R, view(A, t...))
end

in your example, we have

julia> t
(IdOffsetRange(values=-4:-1, indices=-4:-1), 7:7)

so then

julia> LinearIndices(view(B, t...)) |> collect
4×1 Matrix{Int64}:
 1
 2
 3
 4

on main, that final map! call will happily map indices [1 2 3 4] into [-4, -3, -2, -1] whereas this PR makes it (correctly?) throw a BoundsError

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@adienes
Copy link
Member Author

adienes commented Dec 4, 2024

actually, @mbauman 's tour de force #55318 with this PR rebased on top seems to solve the offset arrays case. so maybe the solution is rebase & wait to get more help on that one?

@adienes
Copy link
Member Author

adienes commented Dec 4, 2024

at the very least, this PR would swap "bad" bugs (segfaults, undefined behavior) for "regular" bugs (something throwing a BoundsError when it should not)

@adienes
Copy link
Member Author

adienes commented Jan 26, 2025

updates #30389. also breadcrumb to #46352

@adienes
Copy link
Member Author

adienes commented Apr 16, 2025

#old
map!(f, R, view(A, t...))

#new
map!(f, view(R, firstindex(R):lastindex(R)), view(A, t...))

in mapfirst! is the main update. it looks (and feels) a bit odd to assign into a view like this, but it seems to pass all the test cases now, existing, newly added, and offset-arrays alike.

since a more comprehensive refactor to dimensional reductions appears to be still a ways away, I'd suggest leniency for "ugly" bugfixes like this.

@adienes
Copy link
Member Author

adienes commented Apr 16, 2025

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

Report summary

❗ Packages that crashed

6 packages crashed on the previous version too.

✖ Packages that failed

23 packages failed only on the current version.

  • Illegal method overwrites during precompilation: 1 packages
  • Package has test failures: 3 packages
  • Package tests unexpectedly errored: 1 packages
  • Tests became inactive: 1 packages
  • Test duration exceeded the time limit: 17 packages

1536 packages failed on the previous version too.

✔ Packages that passed tests

27 packages passed tests only on the current version.

  • Other: 27 packages

5121 packages passed tests on the previous version too.

~ Packages that at least loaded

8 packages successfully loaded only on the current version.

  • Other: 8 packages

2855 packages successfully loaded on the previous version too.

➖ Packages that were skipped altogether

905 packages were skipped on the previous version too.

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

Report summary

✖ Packages that failed

1 packages failed only on the current version.

  • Test log exceeded the size limit: 1 packages

2 packages failed on the previous version too.

✔ Packages that passed tests

1 packages passed tests only on the current version.

  • Other: 1 packages

16 packages passed tests on the previous version too.

~ Packages that at least loaded

3 packages successfully loaded on the previous version too.

@adienes
Copy link
Member Author

adienes commented Apr 18, 2025

I think this is clean actually. the single remaining failure has Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable. in the logs on both builds, so I don't think that test should be trusted anyway.

@adienes adienes added status: waiting for PR reviewer and removed needs pkgeval Tests for all registered packages should be run with this change labels Apr 18, 2025
@mbauman
Copy link
Member

mbauman commented Apr 18, 2025

(It took me a bit to understand what was happening here... my incremental review snowballed and became much more than a small comment; my apologies for the harsh appearance.)

@adienes
Copy link
Member Author

adienes commented Apr 18, 2025

no worries; feedback is good! thank you for the review

@mbauman
Copy link
Member

mbauman commented Apr 18, 2025

Let's detangle all of this. At the core, there are three intended behavior changes here, which is precisely how I got all tangled up in knots upon reviewing it.

  • The 3+ arg version has a buggy no-op @boundscheck. Let's change that to be the branch between the shared linear index (if we can) or the zipped eachindex thing. That's obviously a clear and necessary bugfix. Doing this first will make the 3+ arg version silently stop after hitting the end of a shorter destination (instead of doing memory corruption). It won't match the docs — not yet — because that's step 2:
  • None of the methods enforce map!'s documented requirement that dest be at least as large as the smallest array. That could be added as a precondition to all three methods.
  • None of the methods enforce map's documented requirement that — if the arrays have the same dimensionality — they must also have the same axes. This isn't documented for map!, but perhaps could and should be.

@adienes
Copy link
Member Author

adienes commented Apr 18, 2025

ok, I narrowed the scope. only map_n! is changed now with the implementation you suggested.

as you note, this would leave open the question of input validation when the docstring is violated, but at least it fixes the segfaults and uninitialized value exposure (which was my main goal)

@mbauman mbauman removed the minor change Marginal behavior change acceptable for a minor release label Apr 18, 2025
@LilithHafner LilithHafner merged commit 0947114 into JuliaLang:master Apr 19, 2025
5 of 8 checks passed
@nsajko nsajko added the arrays [a, r, r, a, y, s] label Apr 19, 2025
@mbauman mbauman added backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 backport 1.12 Change should be backported to release-1.12 labels Apr 19, 2025
KristofferC pushed a commit that referenced this pull request Apr 22, 2025
@KristofferC KristofferC mentioned this pull request Apr 22, 2025
51 tasks
@KristofferC KristofferC removed the backport 1.12 Change should be backported to release-1.12 label Apr 25, 2025
KristofferC pushed a commit that referenced this pull request Apr 25, 2025
@KristofferC KristofferC mentioned this pull request Apr 25, 2025
71 tasks
LebedevRI pushed a commit to LebedevRI/julia that referenced this pull request May 2, 2025
KristofferC pushed a commit that referenced this pull request Jun 4, 2025
@KristofferC KristofferC mentioned this pull request Jun 4, 2025
75 tasks
KristofferC pushed a commit that referenced this pull request Jun 5, 2025
KristofferC pushed a commit that referenced this pull request Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 bugfix This change fixes an existing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

map! resulting in undefined values
6 participants