parallel partitioned shuffle #50970

tlcz · 2023-08-18T20:00:28Z

add ppshuffle, pprandperm to stdlib.Random (ppmisc.jl)

tlcz · 2023-08-18T20:01:17Z

Hi @JeffBezanson,
@bkamins mentioned some time ago that you expressed interest in a parallelized version of Random.shuffle we developed for generation of large random graphs, possibly to include it into stdlib.Random. Here is a proposed implementation.
The method works by 1) partitioning the input into random partitions and then 2) shuffling the partitions in parallel if multi-threading is enabled. This has a twofold effect 1) speedup from better cache utilization and 2) speedup from parallel processing.
Here are examples run on my 4-core i5-9300H @2.4GHz with 4 julia threads:

julia> n = Int32(1e8);

julia> @time v = shuffle(Base.OneTo(n));
  2.953982 seconds (3 allocations: 381.470 MiB, 0.06% gc time)

julia> @time v = ppshuffle(Base.OneTo(n));
  0.408063 seconds (82 allocations: 381.485 MiB, 0.17% gc time)

julia> @time randperm!(v);
  2.761123 seconds

julia> @time pprandperm!(v);
  0.395918 seconds (80 allocations: 15.500 KiB)

julia> isperm(v)
true

Please let us know what you think.
Regards,
@tolcz

add ppshuffle, pprandperm to stdlib.Random (ppmisc.jl)

tlcz · 2023-08-19T14:10:06Z

Below is rationale behind the method presented recently at WAW2023 (slides 10-15, 18-21).
tolczak_waw2023.pdf

oscardssmith · 2023-08-19T15:39:56Z

Imo if these are strictly better than the regular shuffle and randperm they should be the method used by default (presumably falling back to the single threaded case automatically for small arguments).

Edit: it also seems like the number of threads used should be user selectable.

stdlib/Random/src/ppmisc.jl

oscardssmith · 2023-08-19T15:45:54Z

stdlib/Random/src/ppmisc.jl

+```
+"""
+function ppshuffle!(r::TaskLocalRNG, B::AbstractArray{T}, A::Union{AbstractArray{T}, Base.OneTo{T}}) where {T<:Integer}
+    nparts = max(2, (length(A) * sizeof(T)) >> 21)


where does this come from?

An experimental 'rule of thumb' setting an optimal partition size to 2MB and a minimal partition count to 2. Should be replaced by a more robust heuristic in the future.

bkamins · 2023-08-19T18:21:48Z

Imo if these are strictly better than the regular shuffle and randperm they should be the method used by default (presumably falling back to the single threaded case automatically for small arguments).

These functions use more memory than standard functions so there is a trade-off (that is usually worth to pay though as the overhead is small). @tolcz - can you comment please on the memory allocation comparison? Thank you!

tlcz · 2023-08-19T21:42:11Z

Hi, thank you for the review and for your comments.

Imo if these are strictly better than the regular shuffle and randperm they should be the method used by default (presumably falling back to the single threaded case automatically for small arguments).

The fallback to sequential processing for input which is not 'large enough' for parallel processing is a good idea. In fact it was already implemented in an earlier version of the code and could easily be restored. I removed it as finding an optimal 'transition size' is platform-dependent. So I decided to separate the methods and leave the choice up to the user - at least for now.
In addition consider different signatures - shuffle! is in-place while ppshuffle! is not.

Edit: it also seems like the number of threads used should be user selectable.

I agree, such flexibility is desirable. The current version will simply use the number of threads available for a running Julia process. I will consider this change for the subsequent release.
However it is worth noting that this is a limitation of @threads macro rather then the code itself. As far as I am aware there is currently no way to set number of threads being used at runtime by @threads.

tlcz · 2023-08-19T23:28:01Z

These functions use more memory than standard functions so there is a trade-off (that is usually worth to pay though as the overhead is small). @tolcz - can you comment please on the memory allocation comparison? Thank you!

Yes, as usual for non-embarrassingly parallel problems there is some overhead of parallel processing and the problem size should be 'large enough' to compensate for it. In our case there is an auxiliary Array{Int}[nparts x nthreads()] for tracking decomposition and reassembly of an input array. In addition there is an O(n) size output array required by ppshuffle! which (in contrast to shuffle!) is not in-place.

parallel partitioned shuffle

282b683

add ppshuffle, pprandperm to stdlib.Random (ppmisc.jl)

tlcz force-pushed the tol/ppshuffle branch from d43195f to 282b683 Compare August 19, 2023 09:48

oscardssmith added performance Must go faster randomness Random number generation and the Random stdlib labels Aug 19, 2023

oscardssmith reviewed Aug 19, 2023

View reviewed changes

stdlib/Random/src/ppmisc.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Aug 19, 2023

View reviewed changes

ppshuffle simplify signatures

175bad7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

parallel partitioned shuffle #50970

parallel partitioned shuffle #50970

Uh oh!

tlcz commented Aug 18, 2023

Uh oh!

tlcz commented Aug 18, 2023 •

edited

Loading

Uh oh!

tlcz commented Aug 19, 2023

Uh oh!

oscardssmith commented Aug 19, 2023 •

edited

Loading

Uh oh!

Uh oh!

oscardssmith Aug 19, 2023

Uh oh!

tlcz Aug 19, 2023 •

edited

Loading

Uh oh!

bkamins commented Aug 19, 2023

Uh oh!

tlcz commented Aug 19, 2023 •

edited

Loading

Uh oh!

tlcz commented Aug 19, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

parallel partitioned shuffle #50970

Are you sure you want to change the base?

parallel partitioned shuffle #50970

Uh oh!

Conversation

tlcz commented Aug 18, 2023

Uh oh!

tlcz commented Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlcz commented Aug 19, 2023

Uh oh!

oscardssmith commented Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

oscardssmith Aug 19, 2023

Choose a reason for hiding this comment

Uh oh!

tlcz Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bkamins commented Aug 19, 2023

Uh oh!

tlcz commented Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlcz commented Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tlcz commented Aug 18, 2023 •

edited

Loading

oscardssmith commented Aug 19, 2023 •

edited

Loading

tlcz Aug 19, 2023 •

edited

Loading

tlcz commented Aug 19, 2023 •

edited

Loading

tlcz commented Aug 19, 2023 •

edited

Loading