@@ -146,7 +146,7 @@ Mocks
146
146
enabled. Same numerical issues as ``LINK_IN_DEC ``
147
147
148
148
3. ``FAST_DIVIDE `` -- Divisions are slow but required
149
- :math: `DD(r_p,\pi )`. This Makefile option (in mocks.options) replaces
149
+ :math: `DD(r_p,\pi )`. This `` Makefile `` option (in `` mocks.options `` ) replaces
150
150
the divisions to a reciprocal followed by a Newton-Raphson. The code
151
151
will run ~20% faster at the expense of some numerical precision.
152
152
Please check that the loss of precision is not important for your
@@ -243,25 +243,27 @@ Common Code options for both Mocks and Cosmological Boxes
243
243
2. ``USE_AVX `` -- uses the AVX instruction set found in Intel/AMD CPUs
244
244
>= 2011 (Intel: Sandy Bridge or later; AMD: Bulldozer or later).
245
245
Enabled by default - code will run much slower if the CPU does not
246
- support AVX instructions. On Linux, check for "avx" in /proc/cpuinfo
247
- under flags. If you do not have AVX, but have a SSE4 system instead,
248
- then check the ``develop `` branch for the SSE4 code.
246
+ support AVX instructions. The ``Makefile `` will automatically check
247
+ for "AVX" support and disable this option for unsupported CPUs.
249
248
250
249
3. ``USE_OMP `` -- uses OpenMP parallelization. Scaling is great for DD
251
250
(perfect scaling up to 12 threads in my tests) and okay (runtime
252
251
becomes constant ~6-8 threads in my tests) for ``DDrppi `` and ``wp ``.
252
+ Enabled by default. The ``Makefile `` will compare the `CC ` variable with
253
+ known OpenMP enabled compilers and set compile options accordingly.
253
254
254
255
*Optimization for your architecture *
255
256
256
257
1. The values of ``bin_refine_factor `` and/or ``zbin_refine_factor `` in
257
- the countpairs\_\* .c files control the cache-misses, and
258
+ the `` countpairs\_\*.c `` files control the cache-misses, and
258
259
consequently, the runtime. In my trial-and-error methods, I have seen
259
260
any values larger than 3 are always slower. But some different
260
261
combination of 1/2 for ``(z)bin_refine_factor `` might be faster on
261
262
your platform.
262
263
263
- 2. If you have AVX2/AVX-512/KNC, you will need to rewrite the entire AVX
264
- section.
264
+ 2. If you have AVX2/AVX-512/KNC, you will need to add a new kernel within
265
+ the ``*_kernels.c `` and edit the runtime dispatch code to call this new
266
+ kernel.
265
267
266
268
Author
267
269
======
0 commit comments