You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For advanced usage, refer to the [NUMA section below](#non-uniform-memory-access-numa).
109
+
For convenience Rayon-style parallel iterators pull the `prelude` module and [check out related examples](#rayon-style-parallel-iterators).
109
110
110
111
### Intro in C++
111
112
@@ -369,6 +370,68 @@ No kernel calls.
369
370
No futexes.
370
371
Works in tight loops.
371
372
373
+
### Rayon-style Parallel Iterators
374
+
375
+
For Rayon-style ergonomics, use the parallel iterator API with the `prelude`.
376
+
Unlike Rayon, Fork Union's parallel iterators don't depend on the global state and allow explicit control over the thread pool and scheduling strategy.
377
+
For statically shaped workloads, the default static scheduling is more efficient:
378
+
379
+
```rust
380
+
use fork_union as fu;
381
+
usefork_union::prelude::*;
382
+
383
+
letmutpool=fu::spawn(4);
384
+
letmutdata:Vec<usize> = (0..1000).collect();
385
+
386
+
(&data[..])
387
+
.into_par_iter()
388
+
.with_pool(&mutpool)
389
+
.for_each(|value| {
390
+
println!("Value: {}", value);
391
+
});
392
+
```
393
+
394
+
For dynamic work-stealing, use `with_schedule` with `DynamicScheduler`:
395
+
396
+
```rust
397
+
(&mutdata[..])
398
+
.into_par_iter()
399
+
.with_schedule(&mutpool, DynamicScheduler)
400
+
.for_each(|value| {
401
+
*value*=2;
402
+
});
403
+
```
404
+
405
+
This easily composes with other iterator adaptors, like `map`, `filter`, and `zip`:
406
+
407
+
```rust
408
+
(&data[..])
409
+
.into_par_iter()
410
+
.filter(|&x|x%2==0)
411
+
.map(|x|x*x)
412
+
.with_pool(&mutpool)
413
+
.for_each(|value| {
414
+
println!("Squared even: {}", value);
415
+
});
416
+
```
417
+
418
+
Moreover, each thread can maintain its own scratch space to avoid contention during reductions.
419
+
Cache-line alignment via `CacheAligned` prevents false sharing:
420
+
421
+
```rust
422
+
// Cache-line aligned wrapper to prevent false sharing
> ¹ Another common workload is "Parallel Reductions" covered in a separate [repository](https://github.com/ashvardanian/ParallelReductionsBenchmark).
398
-
> ² When a combination of performance and efficiency cores is used, dynamic stealing may be more efficient than static slicing.
461
+
> ² When a combination of performance and efficiency cores is used, dynamic stealing may be more efficient than static slicing. It's also fair to say, that OpenMP is not optimized for AppleClang.
462
+
> 🔄 Rotation emoji stands for iterators, the default way to use Rayon and the opt-in slower, but more convenient variant for Fork Union.
399
463
400
464
You can rerun those benchmarks with the following commands:
0 commit comments