|
| 1 | +This page describes the Make-based build, which is the default/authoritative |
| 2 | +build method. Note that the OpenBLAS repository also supports building with |
| 3 | +CMake (not described here) - that generally works and is tested, however there |
| 4 | +may be small differences between the Make and CMake builds. |
| 5 | + |
1 | 6 | !!! warning
|
2 | 7 | This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself.
|
3 | 8 |
|
@@ -95,10 +100,21 @@ NUM_PARALLEL - define this to the number of OpenMP instances that your code m
|
95 | 100 | ```
|
96 | 101 |
|
97 | 102 |
|
98 |
| -OpenBLAS uses a fixed set of memory buffers internally, used for communicating and compiling partial results from individual threads. |
99 |
| -For efficiency, the management array structure for these buffers is sized at build time - this makes it necessary to know in advance how |
100 |
| -many threads need to be supported on the target system(s). |
101 |
| -With OpenMP, there is an additional level of complexity as there may be calls originating from a parallel region in the calling program. If OpenBLAS gets called from a single parallel region, it runs single-threaded automatically to avoid overloading the system by fanning out its own set of threads. |
102 |
| -In the case that an OpenMP program makes multiple calls from independent regions or instances in parallel, this default serialization is not |
103 |
| -sufficient as the additional caller(s) would compete for the original set of buffers already in use by the first call. |
104 |
| -So if multiple OpenMP runtimes call into OpenBLAS at the same time, then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. Setting NUM_PARALLEL to the upper bound on the number of OpenMP runtimes that you can have in a process ensures that there are a sufficient number of buffer sets available |
| 103 | +OpenBLAS uses a fixed set of memory buffers internally, used for communicating |
| 104 | +and compiling partial results from individual threads. For efficiency, the |
| 105 | +management array structure for these buffers is sized at build time - this |
| 106 | +makes it necessary to know in advance how many threads need to be supported on |
| 107 | +the target system(s). |
| 108 | + |
| 109 | +With OpenMP, there is an additional level of complexity as there may be calls |
| 110 | +originating from a parallel region in the calling program. If OpenBLAS gets |
| 111 | +called from a single parallel region, it runs single-threaded automatically to |
| 112 | +avoid overloading the system by fanning out its own set of threads. In the case |
| 113 | +that an OpenMP program makes multiple calls from independent regions or |
| 114 | +instances in parallel, this default serialization is not sufficient as the |
| 115 | +additional caller(s) would compete for the original set of buffers already in |
| 116 | +use by the first call. So if multiple OpenMP runtimes call into OpenBLAS at the |
| 117 | +same time, then only one of them will be able to make progress while all the |
| 118 | +rest of them spin-wait for the one available buffer. Setting `NUM_PARALLEL` to |
| 119 | +the upper bound on the number of OpenMP runtimes that you can have in a process |
| 120 | +ensures that there are a sufficient number of buffer sets available. |
0 commit comments