Skip to content

Commit a8e1ff8

Browse files
committed
docs: improve the "Build system" page
1 parent c1b9bb8 commit a8e1ff8

File tree

1 file changed

+23
-7
lines changed

1 file changed

+23
-7
lines changed

docs/build_system.md

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
This page describes the Make-based build, which is the default/authoritative
2+
build method. Note that the OpenBLAS repository also supports building with
3+
CMake (not described here) - that generally works and is tested, however there
4+
may be small differences between the Make and CMake builds.
5+
16
!!! warning
27
This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself.
38

@@ -95,10 +100,21 @@ NUM_PARALLEL - define this to the number of OpenMP instances that your code m
95100
```
96101

97102

98-
OpenBLAS uses a fixed set of memory buffers internally, used for communicating and compiling partial results from individual threads.
99-
For efficiency, the management array structure for these buffers is sized at build time - this makes it necessary to know in advance how
100-
many threads need to be supported on the target system(s).
101-
With OpenMP, there is an additional level of complexity as there may be calls originating from a parallel region in the calling program. If OpenBLAS gets called from a single parallel region, it runs single-threaded automatically to avoid overloading the system by fanning out its own set of threads.
102-
In the case that an OpenMP program makes multiple calls from independent regions or instances in parallel, this default serialization is not
103-
sufficient as the additional caller(s) would compete for the original set of buffers already in use by the first call.
104-
So if multiple OpenMP runtimes call into OpenBLAS at the same time, then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. Setting NUM_PARALLEL to the upper bound on the number of OpenMP runtimes that you can have in a process ensures that there are a sufficient number of buffer sets available
103+
OpenBLAS uses a fixed set of memory buffers internally, used for communicating
104+
and compiling partial results from individual threads. For efficiency, the
105+
management array structure for these buffers is sized at build time - this
106+
makes it necessary to know in advance how many threads need to be supported on
107+
the target system(s).
108+
109+
With OpenMP, there is an additional level of complexity as there may be calls
110+
originating from a parallel region in the calling program. If OpenBLAS gets
111+
called from a single parallel region, it runs single-threaded automatically to
112+
avoid overloading the system by fanning out its own set of threads. In the case
113+
that an OpenMP program makes multiple calls from independent regions or
114+
instances in parallel, this default serialization is not sufficient as the
115+
additional caller(s) would compete for the original set of buffers already in
116+
use by the first call. So if multiple OpenMP runtimes call into OpenBLAS at the
117+
same time, then only one of them will be able to make progress while all the
118+
rest of them spin-wait for the one available buffer. Setting `NUM_PARALLEL` to
119+
the upper bound on the number of OpenMP runtimes that you can have in a process
120+
ensures that there are a sufficient number of buffer sets available.

0 commit comments

Comments
 (0)