-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
parallel partitioned shuffle #50970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
parallel partitioned shuffle #50970
Conversation
Hi @JeffBezanson, julia> n = Int32(1e8);
julia> @time v = shuffle(Base.OneTo(n));
2.953982 seconds (3 allocations: 381.470 MiB, 0.06% gc time)
julia> @time v = ppshuffle(Base.OneTo(n));
0.408063 seconds (82 allocations: 381.485 MiB, 0.17% gc time)
julia> @time randperm!(v);
2.761123 seconds
julia> @time pprandperm!(v);
0.395918 seconds (80 allocations: 15.500 KiB)
julia> isperm(v)
true Please let us know what you think. |
add ppshuffle, pprandperm to stdlib.Random (ppmisc.jl)
Below is rationale behind the method presented recently at WAW2023 (slides 10-15, 18-21). |
Imo if these are strictly better than the regular Edit: it also seems like the number of threads used should be user selectable. |
``` | ||
""" | ||
function ppshuffle!(r::TaskLocalRNG, B::AbstractArray{T}, A::Union{AbstractArray{T}, Base.OneTo{T}}) where {T<:Integer} | ||
nparts = max(2, (length(A) * sizeof(T)) >> 21) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does this come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An experimental 'rule of thumb' setting an optimal partition size to 2MB and a minimal partition count to 2. Should be replaced by a more robust heuristic in the future.
These functions use more memory than standard functions so there is a trade-off (that is usually worth to pay though as the overhead is small). @tolcz - can you comment please on the memory allocation comparison? Thank you! |
Hi, thank you for the review and for your comments.
The fallback to sequential processing for input which is not 'large enough' for parallel processing is a good idea. In fact it was already implemented in an earlier version of the code and could easily be restored. I removed it as finding an optimal 'transition size' is platform-dependent. So I decided to separate the methods and leave the choice up to the user - at least for now.
I agree, such flexibility is desirable. The current version will simply use the number of threads available for a running Julia process. I will consider this change for the subsequent release. |
Yes, as usual for non-embarrassingly parallel problems there is some overhead of parallel processing and the problem size should be 'large enough' to compensate for it. In our case there is an auxiliary |
add ppshuffle, pprandperm to stdlib.Random (ppmisc.jl)