Skip to content

Throw an error in checktask when a task failed #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 7, 2025

Conversation

efaulhaber
Copy link
Contributor

@efaulhaber efaulhaber commented Jan 7, 2025

Solves JuliaSIMD/Polyester.jl#153.

julia> @batch for j in axes(x, 2)
                  for i in 1:5
                      if i > 3 && j in (2, 4, 9)
                          error("some error")
                      end
                      x[i, j] = 1
                  end
              end
ERROR: TaskFailedException
Stacktrace:
 [1] checktask(tid::UInt32)
   @ ThreadingUtilities ~/.julia/dev/ThreadingUtilities/src/threadtasks.jl:74
 [2] wait
   @ ~/.julia/packages/ThreadingUtilities/3z3g0/src/threadtasks.jl:88 [inlined]
 [3] wait
   @ ~/.julia/packages/ThreadingUtilities/3z3g0/src/threadtasks.jl:81 [inlined]
 [4] macro expansion
   @ ~/.julia/packages/Polyester/eqrC9/src/batch.jl:257 [inlined]
 [5] _batch_no_reserve
   @ ~/.julia/packages/Polyester/eqrC9/src/batch.jl:168 [inlined]
 [6] batch(::var"#56#57", ::Val{…}, ::Tuple{}, ::Tuple{}, ::Tuple{…}, ::Static.StaticInt{…}, ::Static.StaticInt{…}, ::Polyester.NoLoop, ::Polyester.CombineIndices, ::Matrix{…})
   @ Polyester ~/.julia/packages/Polyester/eqrC9/src/batch.jl:334
 [7] top-level scope
   @ ~/.julia/packages/Polyester/eqrC9/src/closure.jl:456

    nested task error: some error
    Stacktrace:
     [1] error(s::String)
       @ Base ./error.jl:35
     [2] macro expansion
       @ ./REPL[25]:4 [inlined]
     [3] #56
       @ ~/.julia/packages/Polyester/eqrC9/src/closure.jl:309 [inlined]
     [4] (::Polyester.BatchClosure{var"#56#57", ManualMemory.Reference{Tuple{…}}, false, Tuple{}})(p::Ptr{UInt64})
       @ Polyester ~/.julia/packages/Polyester/eqrC9/src/batch.jl:11
     [5] _call
       @ ~/.julia/packages/ThreadingUtilities/3z3g0/src/threadtasks.jl:11 [inlined]
     [6] (::ThreadingUtilities.ThreadTask)()
       @ ThreadingUtilities ~/.julia/dev/ThreadingUtilities/src/threadtasks.jl:29
Some type information was truncated. Use `show(err)` to see complete types.

Note that JuliaSIMD/Polyester.jl#154 should probably get merged first.

@efaulhaber
Copy link
Contributor Author

@oscardssmith ?

@oscardssmith oscardssmith reopened this Mar 5, 2025
@oscardssmith
Copy link
Member

this is a package in not especially familiar with, so I wanted to see green CI, but CI seems somewhat broken

@efaulhaber
Copy link
Contributor Author

Note that this is what @chriselrod suggested in JuliaSIMD/Polyester.jl#153.

@efaulhaber
Copy link
Contributor Author

So, how do we proceed here?

@efaulhaber
Copy link
Contributor Author

Bump

@oscardssmith oscardssmith reopened this Apr 1, 2025
@oscardssmith
Copy link
Member

Sorry for this taking so long. CI for this repo was broken (fixed by #55)

@oscardssmith oscardssmith reopened this Apr 1, 2025
@oscardssmith
Copy link
Member

Looks like CI is fixed! (although it still is requiring manual approval for some reason)

@oscardssmith
Copy link
Member

@efaulhaber looks like this PR is broken.

@efaulhaber
Copy link
Contributor Author

I spent the last half hour trying to understand this test, but I still don't know what it does:

struct Copy{P} end
function (::Copy{P})(p::Ptr{UInt}) where {P}
  _, (ptry, ptrx, N) = ThreadingUtilities.load(p, P, 2 * sizeof(UInt))
  N > 0 || throw("This function throws if N == 0 for testing purposes.")
  @simd ivdep for n  1:N
    unsafe_store!(ptry, unsafe_load(ptrx, n), n)
  end
  ThreadingUtilities._atomic_store!(p, ThreadingUtilities.SPIN)
end
@generated function copy_ptr(::A, ::B) where {A,B}
  c = Copy{Tuple{A,B,Int}}()
  quote
    @cfunction($c, Cvoid, (Ptr{UInt},))
  end
end
function setup_copy!(p, y, x)
  N = length(y)
  @assert length(x) == N
  py = pointer(y)
  px = pointer(x)
  fptr = copy_ptr(py, px)
  offset = ThreadingUtilities.store!(p, fptr, sizeof(UInt))
  ThreadingUtilities.store!(p, (py, px, N), offset)
end

@inline launch_thread_copy!(tid, y, x) = ThreadingUtilities.launch(setup_copy!, tid, y, x)

function test_copy(tid, N = 100_000)
  a = rand(N)
  b = rand(N)
  c = rand(N)
  x = similar(a) .= NaN
  y = similar(b) .= NaN
  z = similar(c) .= NaN
  GC.@preserve a b c x y z begin
    launch_thread_copy!(tid, x, a)
    yield()
    @assert !ThreadingUtilities.wait(tid)
    launch_thread_copy!(tid, y, b)
    yield()
    @assert !ThreadingUtilities.wait(tid)
    launch_thread_copy!(tid, z, c)
    yield()
    @assert !ThreadingUtilities.wait(ThreadingUtilities.taskpointer(tid))
  end
  @test a == x
  @test b == y
  @test c == z
end

The yield() calls wait, which throws the error throw("This function throws if N == 0 for testing purposes.").
On main, this error is just ignored, which is why this test worked before.

@oscardssmith
Copy link
Member

@chriselrod do you know what's should happen here?

@chriselrod
Copy link
Member

for tid eachindex(ThreadingUtilities.TASKS)
launch_thread_copy!(tid, Float64[], Float64[])
end
sleep(1)
@test all(istaskfailed, ThreadingUtilities.TASKS)
@test all(ThreadingUtilities.wait, eachindex(ThreadingUtilities.TASKS))
@test !any(istaskfailed, ThreadingUtilities.TASKS)

This is supposed to

  1. Make all tasks error.
  2. sleep
  3. check that all tasks have errored
  4. ThreadingUtilities.wait on them to reset them, which returns true to indicate that a reset had occured
  5. test that none of them are in a failed state.

test_copy doesn't get called until after this, once all tests are in a non-failed state. It gets called with non-0 N, so it won't fail with an N==0 error.

This PR changes whose responsibility it is to do what. Before, wait is supposed to check for success/failure, and forward any errors.

@efaulhaber
Copy link
Contributor Author

That makes sense. Thanks! I fixed the tests now and added a few comments.
I now test that wait throws an error for each thread.

@efaulhaber
Copy link
Contributor Author

@oscardssmith Most tests are passing now. The nightly builds and docs failures seem to be unrelated.

@oscardssmith oscardssmith merged commit 142ab6c into JuliaSIMD:main Apr 7, 2025
31 of 34 checks passed
@efaulhaber efaulhaber deleted the checktask-error branch April 7, 2025 13:16
@efaulhaber
Copy link
Contributor Author

@oscardssmith would you mind also adding a new release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants