add `Random.fork(rng::Xoshiro)` to split `rng` into a new instance #58193

rfourquet · 2025-04-22T12:30:03Z

When new tasks are spawned, the task-local RNG of the current task is "split", or "forked", into a new RNG for the new task. This is achieved in a fast and sound way, involving a secondary RNG "hidden" in the 5th field of the TaskLocalRNG struct.
Xoshiro was made to mirror TaskLocalRNG and also has a 5th field, mostly unused. It was useful for example when you want to save the state of TaskLocalRNG() via copy(TaskLocalRNG()), which returns a Xoshiro.

This PR re-implements in julia this forking mechanism from jl_rng_split() in "src/task.c", to allow forking Xoshiro RNGs independently of task spawning. This is in particular useful for parallel (reproducible) computations, where tasks can be spawned with a forked (explicit) RNG from an initial Xoshiro object.

There are alternatives, like "jumping", or for example seeding new RNGs objects from a master seed and a "worker id", like in Xoshiro([master_seed, worker_id]), but these techniques require some coordination. For example, it can be unsafe to locally create a new RNG via jumping, because you might have collision with another RNG created from the parent task also via jumping with the same number of steps.
In contrast, "forking" allows to make local decisions about the shape of the computation without risks of collisions.
And it's simpler: you just write forked_rng = Random.fork(src_rng).

Alternative names could be

spawn, which is used in Numpy, but that I find less descriptive than fork; or
split, perhaps what is mostly found in the literature, and also appears in the name jl_rng_split; one problem being that Base.split means something else.

"Fork" seems appropriate as it's used in the name Random.forkRand for array-filling with SIMD, and in the long comment above jl_rng_split.

rfourquet · 2025-05-06T15:57:53Z

Small changes:

With Random: allow seeding from an RNG #51533 allowing seeding from an rng, fork!(dst, src) becomes redundant with seed!(dst, src), so implement the latter via the same forking algorithm.
add an array version, which is useful for example with Threads.@threads
remove fork(TaskLocalRNG()): it initially returned a Xoshiro(), but this was complicating the docstring; returning TaskLocalRNG() might be better if it's appropriate for all expected uses of fork. But punt on that for now.

fingolfin · 2025-05-08T00:33:39Z

has merge conflicts now

When new tasks are spawned, the task-local RNG of the current task is "split", or "forked", into a new RNG for the new task. This is achieved in a fast and sound way, involving a secondary RNG "hidden" in the 5th field of the `TaskLocalRNG` struct. `Xoshiro` was made to mirror `TaskLocalRNG` and also has a 5th field, mostly unused. It was useful for example when you want to save the state of `TaskLocalRNG()` via `copy(TaskLocalRNG())`, which returns a `Xoshiro`. This PR re-implements in julia this forking mechanism from `jl_rng_split()` in "src/task.c", to allow forking `Xoshiro` RNGs independently of task spawning. This is in particular useful for parallel (reproducible) computations, where tasks can be spawned with a forked (explicit) RNG from an initial `Xoshiro` object. There are alternatives, like "jumping", or for example seeding new RNGs objects from a master seed and a "worker id", like in `Xoshiro((master_seed, id))`, but these techniques require some coordination. For example, it can be unsafe to locally create a new RNG via jumping, because you might have collision with another RNG created from the parent task also via jumping with the same number of steps. In contrast, "forking" allows to make local decisions about the shape of the computation without risks of collisions. Alternative names could be * `spawn`, which is used in Numpy, but that I find less descriptive than `fork`; or * `split`, perhaps what is mostly found in the literature, and also appears in the name `jl_rng_split`; one problem being that `Base.split` means something else. "Fork" seems appropriate as it's used in the name `Random.forkRand` for array-filling with SIMD, and in the long comment above `jl_rng_split`.

Now that seed! accepts an rng as 2nd arg, fork! is replaced by seed!. Array version is useful for `@threads` (cf. docstring). `fork(TaskLocalRNG())` is ambiguous: return a Xoshiro or TaskLocalRNG() ? Leave that question for later. Also, returning Xoshiro was seriously complicating the docstring.

We have long had methods for RNG "jumps ahead", i.e. advancing the state by a given number of "steps", but no good API for that. The only public API is `Future.randjump(r::MersenneTwister, steps::Integer)`, and there are also functions for `Xoshiro` which are not public (`Random.jump_128` and friends). The following generic API is implemented here: * `Random.jump(rng)` to jump by a reasonable default number of steps * `Random.jump(rng; by::Real)` to jump by `by` steps * `Random.jump!(rng; [by])` to equivalently jump in-place * `Random.jump(rng, dims...)` to create an array of jumped RNGs In old julia versions, there also existed a method of `randjump` returning an array, but the 1st element of this array was the passed argument; the version here does not do this aliasing. There are two kinds of integers one would wish to pass: dimensions for the array version, and the number of steps. Using jumps is relatively "niche", but needing to fidle with the number of steps is even more niche. It's expected that in the vast majority of cases, a good default is enough. Some APIs in other languages have `jump` (e.g. 2^128 steps) and `long_jump` (e.g. 2^192 steps), or `leap` in java, for more complicated cases; for example each process gets a jumped RNG via `long_jump`, and within each process, each thread gets a jumped RNG via `jump`. But this is not very scalable if more kind of jumps are needed: should `huge_jump` be introduced? For these rare cases where the default number of steps is not sufficient, it seems better to let the programmer explicitly specify the number of steps via an integer. There is even a third kind of integers one might want to pass: in `Random.jump_128(x::Xoshiro, i::Integer)`, `i` represents the number of times a jump of size `2^128` is applied; this is because `Xoshiro` doesn't support arbitrary number of steps; this is not supported in the proposed API, because 1) it's trivial for the user to implement herself, and 2) in probably most use cases, using the array version will be a valid alternative, and more efficient because previous computations are not wasted (like in `[Random.jump_128(x, i) for i=1:num_tasks]` vs `Random.jump(x, num_tasks)`). Another argument in favor of this API is that it mirrors the proposed `Random.fork(rng, dims...)` function from #58193.

We have long had methods for RNG "jumps ahead", i.e. advancing the state by a given number of "steps", but no good API for that. The only public API is `Future.randjump(r::MersenneTwister, steps::Integer)`, and there are also functions for `Xoshiro` which are not public (`Random.jump_128` and friends). The following generic API is implemented here: * `Random.jump(rng)` to jump by a reasonable default number of steps * `Random.jump(rng; by::Real)` to jump by `by` steps * `Random.jump!(rng; [by])` to equivalently jump in-place * `Random.jump(rng, dims...; [by])` to create an array of jumped RNGs In old julia versions, there also existed a method of `randjump` returning an array, but the 1st element of this array was the passed argument; the version here does not do this aliasing. There are two kinds of integers one would wish to pass: dimensions for the array version, and the number of steps. Using jumps is relatively "niche", but needing to fidle with the number of steps is even more niche. It's expected that in the vast majority of cases, a good default is enough. Some APIs in other languages have `jump` (e.g. 2^128 steps) and `long_jump` (e.g. 2^192 steps), or `leap` in java, for more complicated cases; for example each process gets a jumped RNG via `long_jump`, and within each process, each thread gets a jumped RNG via `jump`. But this is not very scalable if more kind of jumps are needed: should `huge_jump` be introduced? For these rare cases where the default number of steps is not sufficient, it seems better to let the programmer explicitly specify the number of steps via an integer. There is even a third kind of integers one might want to pass: in `Random.jump_128(x::Xoshiro, i::Integer)`, `i` represents the number of times a jump of size `2^128` is applied; this is because `Xoshiro` doesn't support arbitrary number of steps; this is not supported in the proposed API, because 1) it's trivial for the user to implement herself, and 2) in probably most use cases, using the array version will be a valid alternative, and more efficient because previous computations are not wasted (like in `[Random.jump_128(x, i) for i=1:num_tasks]` vs `Random.jump(x, num_tasks)`). Another argument in favor of this API is that it mirrors the proposed `Random.fork(rng, dims...)` function from #58193.

rfourquet added randomness Random number generation and the Random stdlib feature Indicates new feature / enhancement requests labels Apr 22, 2025

rfourquet force-pushed the rf/randfork branch from 889c18f to 0e56550 Compare May 6, 2025 15:52

rfourquet force-pushed the rf/randfork branch from 0e56550 to b9a6a2d Compare May 6, 2025 16:10

rfourquet added 3 commits May 8, 2025 09:26

small updates

edceb0b

rfourquet force-pushed the rf/randfork branch from b9a6a2d to 60d7bdb Compare May 8, 2025 07:42

rfourquet mentioned this pull request May 8, 2025

add Random.jump(rng) API #58353

Open

few fixes and updates

dbe2333

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add `Random.fork(rng::Xoshiro)` to split `rng` into a new instance #58193

add `Random.fork(rng::Xoshiro)` to split `rng` into a new instance #58193

Uh oh!

rfourquet commented Apr 22, 2025 •

edited

Loading

Uh oh!

rfourquet commented May 6, 2025

Uh oh!

fingolfin commented May 8, 2025

Uh oh!

Uh oh!

Uh oh!

add Random.fork(rng::Xoshiro) to split rng into a new instance #58193

Are you sure you want to change the base?

add Random.fork(rng::Xoshiro) to split rng into a new instance #58193

Uh oh!

Conversation

rfourquet commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rfourquet commented May 6, 2025

Uh oh!

fingolfin commented May 8, 2025

Uh oh!

Uh oh!

add `Random.fork(rng::Xoshiro)` to split `rng` into a new instance #58193

add `Random.fork(rng::Xoshiro)` to split `rng` into a new instance #58193

rfourquet commented Apr 22, 2025 •

edited

Loading