|
| 1 | +This file contains unreleased/unannounced changes for parallelization, in addition to those listed in VERSIONS. It will be merged into VERSIONS once parallel SLiM is released publicly. |
| 2 | + |
| 3 | +PARALLEL changes (now in the master branch, but disabled): |
| 4 | + TO DO: the openmp branch had changes to travis.yml to test the openmp branch, which should be replicated somehow for GitHub Actions: |
| 5 | + using macos xcode11.6/clang and linux bionic/gcc |
| 6 | + on osx, "brew install libomp;" "ls -l /usr/local/opt/libomp/include/omp.h;" "ls -l /usr/local/include;" |
| 7 | + note the paths for the headers/lib are likely different, so that will require a bit of legwork |
| 8 | + updated SLiM.xcodeproj for OpenMP configuration: |
| 9 | + duplicate the entire SLiM directory to SLiM_temp, open the SLiM.xcodeproj file in the copy, rename it in the Project Navigator to "SLiM_OpenMP", copy it back into the SLiM folder and delete the rest of the copy |
| 10 | + use grep to find "SLiM_temp" inside files in the new SLiM_OpenMP.xcodeproj and fix them to refer to the correct directory (might be just one, or none) |
| 11 | + add system header search path for targets eidos_multi and SLiM_multi, in Build Settings: /usr/local/include/ (non-recursive) |
| 12 | + add library search path for targets eidos_multi and SLiM_multi, in Build Settings: /usr/local/lib/ (non-recursive) |
| 13 | + add link to binary library, under "Build Phases" (Link Binary With Libraries) on targets eidos_multi and SLiM_multi, by dragging in /usr/local/lib/libomp.dylib |
| 14 | + add to Other C Flags and Other C++ Flags for targets eidos_multi and SLiM_multi, in Build Settings: -Xclang and -fopenmp |
| 15 | + disable library validation for targets eidos_multi and SLiM_multi, in Signing & Capabilities: https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_disable-library-validation |
| 16 | + set Enable Index-While-Building Functionality to NO in Build Settings on targets eidos_multi and SLiM_multi |
| 17 | + disable Link-Time Optimization, which appears to be incompatible with OpenMP |
| 18 | + switched "Build Active Architecture Only" in Build Settings to YES for Release builds, for targets eidos_multi and SLiM_multi, because the older libomp.dylib builds do not have arm64 in them |
| 19 | + see http://antonmenshov.com/2017/09/09/clang-openmp-setup-in-xcode/ |
| 20 | + added -Rpass="loop|vect" -Rpass-missed="loop|vect" -Rpass-analysis="loop|vect" to Other C++ Flags for eidos_multi, to aid in optimization work |
| 21 | + added "-fno-math-errno" to the project-level Other C++ Flags, to match the main project |
| 22 | + added $(OTHER_CPLUSPLUSFLAGS) to that build setting for slim_multi and eidos_multi, to inherit from the project's settings |
| 23 | + require 10.13 or later for eidos_multi and slim_multi, since that's the macOS version that the R libomp packages target |
| 24 | + added SLiMguiLegacy_multi and EidosScribe_multi targets/schemes to the project for debugging/testing purposes; not intended for end users! |
| 25 | + modify CMakeLists.txt to add a "PARALLEL" flag that can be set to "ON" to enable OpenMP with the -fopenmp flag |
| 26 | + modify CMakeLists.txt to add /usr/local/include to all targets (through GSL_INCLUDES, which should maybe get cleaned up somehow) |
| 27 | + modify CMakeLists.txt to build executables eidos_multi and slim_multi when PARALLEL is ON, and to link in libomp |
| 28 | + disable parallel builds of EidosScribe, SLiMgui, and SLiMguiLegacy with #error directives; parallelization is for the command line only (active threads peg the CPU, passive threads are slower than single-threaded) |
| 29 | + add a command-line option, -maxThreads <n>, to control the maximum number of threads used in OpenMP |
| 30 | + add a header eidos_openmp.h that should be used instead of #include "omp.h", providing various useful definitions etc. |
| 31 | + add an initialization function for OpenMP, Eidos_WarmUpOpenMP(), that must be called when warming up; sets various options, logs diagnostic info |
| 32 | + set default values for OMP_WAIT_POLICY (active) and OMP_PROC_BIND (false) and OMP_DYNAMIC (false) for slim and eidos |
| 33 | + add THREAD_SAFETY_CHECK() convention for marking areas that are known not to be thread-safe, such as due to use of statics or globals |
| 34 | + add gEidosMaxThreads global that has the max number of threads for the session (for pre-allocating per-thread data structures) |
| 35 | + create per-thread SparseVector pools to allow high-scalability parallelization of spatial interactions |
| 36 | + add parallel sorting code - quicksort and mergesort, rather experimental at this point, just early work |
| 37 | + create per-thread random number generators to allow parallelization of code that involves RNG usage |
| 38 | + parallelize dnorm(), rbinom(), rdunif(), rexp(), rnorm(), rpois(), runif() |
| 39 | + parallelize tallying mutation refcounts, for the optimized fast case |
| 40 | + parallelize fitness evaluation for mutations, when there are no mutationEffect() or fitnessEffect() callbacks |
| 41 | + parallelize non-mutation-based fitness evaluation (i.e., only fitnessScaling values) |
| 42 | + parallelize some minor tick cycle bookkeeping: incrementing individual ages, clearing the migrant flag |
| 43 | + parallelize the viability/survival tick cycle stage's survival evaluation (without callbacks) |
| 44 | + parallelize localPopulationDensity(), clippedIntegral(), and neighborCount() |
| 45 | + add gEidosNumThreads global that has the currently requested number of threads (usually equal to gEidosMaxThreads) |
| 46 | + add functions to Eidos to support parallel execution: parallelGetNumThreads(), parallelGetMaxThreads(), and parallelSetNumThreads() |
| 47 | + add unit tests for parallelized functions in Eidos, comparing against the same functions non-parallel |
| 48 | + add nowait clause on parallel for loops that would otherwise have a double barrier |
| 49 | + shift to allocating per-thread RNGs in parallelized allocations, so "first touch" places each RNG in memory close to that thread's core |
| 50 | + switch to dynamic scheduling for rbinom() and rdunif(), which seem to need it for unclear reasons |
| 51 | + use a minimum chunk size of 16 for all dynamic schedules; problems that parallelize should always be at least that big, I imagine |
| 52 | + the exception is countOfMutationsOfType() / sumOfMutationsOfType() and similar, because they do so much work per iteration that they can be worth going parallel even for two iterations |
| 53 | + note that we never use schedule(guided) because its implementation in GCC is brain-dead; see https://stackoverflow.com/a/43047074/2752221 |
| 54 | + switch to OMP_PROC_BIND == true by default; keep threads with the data they're working on, recommended practice |
| 55 | + parallelize more Eidos math functions: abs(float), ceil(), exp(float), floor(), log(float), log10(float), log2(float), round(), sqrt(float), trunc() |
| 56 | + parallelize Eidos min/max functions: max(int), max(float), min(int), min(float), pmax(int), pmax(float), pmin(int), pmin(float) |
| 57 | + parallelize match(), tabulate(), sample (int/float/object, replace=T, weights=NULL), mean(l/i/f) |
| 58 | + parallelize Individual -relatedness(), -countOfMutationsOfType(), -setSpatialPosition(); Species -individualsWithPedigreeIDs() |
| 59 | + parallelize Subpopulation -pointInBounds(), -pointReflected(), -pointStopped(), pointPeriodic(), pointUniform() |
| 60 | + parallelize Genome -containsMarkerMutation() |
| 61 | + change how RNG seeding works, for single-threaded and multi-threaded runs: use /dev/random to generate initial seeds, and for multi-threaded runs, use /dev/random for seeds for all threads beyond thread 0 |
| 62 | + parallelize Genome -countOfMutationsOfType(), Subpopulation -spatialMapValue() and -sampleIndividuals() |
| 63 | + add unit tests for various SLiM methods that have been parallelized |
| 64 | + parallelize InteractionType nearestNeighbors(), nearestInteractingNeighbors(), drawByStrength() by adding a returnDict parameter that lets the user get a Dictionary as a result, with one entry per receiver |
| 65 | + their "receiver" parameter is no longer required to be singleton (when returnDict == T), and their return value is now just "object" since it can be Individual or Dictionary |
| 66 | + overhaul the internal MutationRun design to allow it to be used multithreaded - get rid of refcounting, switch to usage tallying |
| 67 | + parallelize offspring generation in WF models, when no callbacks (mateChoice(), modifyChild(), recombination(), mutation()) are active; no tree-seq, no cloning |
| 68 | + parallelize offspring generation in WF model: add support for cloning |
| 69 | + parallelize tallying of mutation runs, clearing of parental genomes at the end of WF model ticks, and mutation run uniquing |
| 70 | + add support for parallel reproduction when a non-default stacking policy is in use |
| 71 | + add support for parallel reproduction with tree-sequence recording |
| 72 | + add parallel reproduction in nonWF models, with no callbacks (recombination(), mutation()) active - modifyChild() callbacks are legal but cannot access child genomes since they are deferred |
| 73 | + this is achieved by passing defer=T to addCrossed(), addCloned(), addSelfed(), or addRecombinant() |
| 74 | + thread-safety work - break in backward reproducibility for scripts that use a type 's' DFE, because the code path for that shifted |
| 75 | + algorithm change for nearestNeighbors() and nearestNeighborsOfPoint(), when count is >1 and <N; the old algorithm was not thread-safe, and was inefficient; breaks backward reproducibility for the relevant case |
| 76 | + the break in backward reproducibility is because the old algorithm returned the neighbors in a scrambled order, whereas the new algorithm returns them in sorted order; the order is not documented, so both are compliant |
| 77 | + add max thread counts for each parallel loop, allowing flexible customization both by Eidos and by the user |
| 78 | + tweak the semantics of parallelSetNumThreads(): an integer requests that exact number of threads, whereas NULL requests "up to" the max number of threads (the default behavior) |
| 79 | + add gEidosNumThreadsOverride flag indicating whether the number of threads has been overridden explicitly |
| 80 | + add parallelSetTaskThreadCounts() function to set per-task thread counts, and parallelGetTaskThreadCounts() to get them |
| 81 | + add recipe 22.7, Parallelizing nonWF reproduction and spatial interactions |
| 82 | + add internal EidosBenchmark facility for timing of internal loops, for benchmarking of parallelization scaling |
| 83 | + add a command-line option, -perTaskThreads, to let the user control how many threads to use for different tasks (if not overridden) |
| 84 | + parallelize sorting, where possible/efficient, add requisite per-task thread count keys |
| 85 | + parallelize edge table sorting during simplification, add requisite per-task thread count keys |
| 86 | + |
0 commit comments