Skip to content

Commit 9ba9c8b

Browse files
authored
Merge pull request #4165 from rgommers/docs-packaging-and-ilp64
Add documentation on redistributing OpenBLAS
2 parents 849c880 + ee72575 commit 9ba9c8b

File tree

1 file changed

+270
-0
lines changed

1 file changed

+270
-0
lines changed

docs/distributing.md

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
# Guidance for redistributing OpenBLAS
2+
3+
*We note that this document contains recommendations only - packagers and other
4+
redistributors are in charge of how OpenBLAS is built and distributed in their
5+
systems, and may have good reasons to deviate from the guidance given on this
6+
page. These recommendations are aimed at general packaging systems, with a user
7+
base that typically is large, open source (or freely available at least), and
8+
doesn't behave uniformly or that the packager is directly connected with.*
9+
10+
OpenBLAS has a large number of build-time options which can be used to change
11+
how it behaves at runtime, how artifacts or symbols are named, etc. Variation
12+
in build configuration can be necessary to acheive a given end goal within a
13+
distribution or as an end user. However, such variation can also make it more
14+
difficult to build on top of OpenBLAS and ship code or other packages in a way
15+
that works across many different distros. Here we provide guidance about the
16+
most important build options, what effects they may have when changed, and
17+
which ones to default to.
18+
19+
The Make and CMake build systems provide equivalent options and yield more or
20+
less the same artifacts, but not exactly (the CMake builds are still
21+
experimental). You can choose either one and the options will function in the
22+
same way, however the CMake outputs may require some renaming. To review
23+
available build options, see `Makefile.rule` or `CMakeLists.txt` in the root of
24+
the repository.
25+
26+
Build options typically fall into two categories: (a) options that affect the
27+
user interface, such as library and symbol names or APIs that are made
28+
available, and (b) options that affect performance and runtime behavior, such
29+
as threading behavior or CPU architecture-specific code paths. The user
30+
interface options are more important to keep aligned between distributions,
31+
while for the performance-related options there are typically more reasons to
32+
make choices that deviate from the defaults.
33+
34+
Here are recommendations for user interface related packaging choices where it
35+
is not likely to be a good idea to deviate (typically these are the default
36+
settings):
37+
38+
1. Include CBLAS. The CBLAS interface is widely used and it doesn't affect
39+
binary size much, so don't turn it off.
40+
2. Include LAPACK and LAPACKE. The LAPACK interface is also widely used, and
41+
while it does make up a significant part of the binary size of the installed
42+
library, that does not outweigh the regression in usability when deviating
43+
from the default here.[^1]
44+
3. Always distribute the pkg-config (`.pc`) and CMake `.cmake`) dependency
45+
detection files. These files are used by build systems when users want to
46+
link against OpenBLAS, and there is no benefit of leaving them out.
47+
4. Provide the LP64 interface by default, and if in addition to that you choose
48+
to provide an ILP64 interface build as well, use a symbol suffix to avoid
49+
symbol name clashes (see the next section).
50+
51+
[^1] All major distributions do include LAPACK as of mid 2023 as far as we
52+
know. Older versions of Arch Linux did not, and that was known to cause
53+
problems.
54+
55+
56+
## ILP64 interface builds
57+
58+
The LP64 (32-bit integer) interface is the default build, and has
59+
well-established C and Fortran APIs as determined by the reference (Netlib)
60+
BLAS and LAPACK libraries. The ILP64 (64-bit integer) interface however does
61+
not have a standard API: symbol names and shared/static library names can be
62+
produced in multiple ways, and this tends to make it difficult to use.
63+
As of today there is an agreed-upon way of choosing names for OpenBLAS between
64+
a number of key users/redistributors, which is the closest thing to a standard
65+
that there is now. However, there is an ongoing standardization effort in the
66+
reference BLAS and LAPACK libraries, which differs from the current OpenBLAS
67+
agreed-upon convention. In this section we'll aim to explain both.
68+
69+
Those two methods are fairly similar, and have a key thing in common: *using a
70+
symbol suffix*. This is good practice; it is recommended that if you distribute
71+
an ILP64 build, to have it use a symbol suffix containing `64` in the name.
72+
This avoids potential symbol clashes when different packages which depend on
73+
OpenBLAS load both an LP64 and an ILP64 library into memory at the same time.
74+
75+
### The current OpenBLAS agreed-upon ILP64 convention
76+
77+
This convention comprises the shared library name and the symbol suffix in the
78+
shared library. The symbol suffix to use is `64_`, implying that the library
79+
name will be `libopenblas64_.so` and the symbols in that library end in `64_`.
80+
The central issue where this was discussed is
81+
[openblas#646](https://github.com/xianyi/OpenBLAS/issues/646), and adopters
82+
include Fedora, Julia, NumPy and SciPy - SuiteSparse already used it as well.
83+
84+
To build shared and static libraries with the currently recommended ILP64
85+
conventions with Make:
86+
```bash
87+
$ make INTERFACE64=1 SYMBOLSUFFIX=64_
88+
```
89+
90+
This will produce libraries named `libopenblas64_.so|a`, a pkg-config file
91+
named `openblas64.pc`, and CMake and header files.
92+
93+
Installing locally and inspecting the output will show a few more details:
94+
```bash
95+
$ make install PREFIX=$PWD/../openblas/make64 INTERFACE64=1 SYMBOLSUFFIX=64_
96+
$ tree . # output slightly edited down
97+
.
98+
├── include
99+
│   ├── cblas.h
100+
│   ├── f77blas.h
101+
│   ├── lapacke_config.h
102+
│   ├── lapacke.h
103+
│   ├── lapacke_mangling.h
104+
│   ├── lapacke_utils.h
105+
│   ├── lapack.h
106+
│   └── openblas_config.h
107+
└── lib
108+
├── cmake
109+
│   └── openblas
110+
│   ├── OpenBLASConfig.cmake
111+
│   └── OpenBLASConfigVersion.cmake
112+
├── libopenblas64_.a
113+
├── libopenblas64_.so
114+
└── pkgconfig
115+
└── openblas64.pc
116+
```
117+
118+
A key point are the symbol names. These will equal the LP64 symbol names, then
119+
(for Fortran only) the compiler mangling, and then the `64_` symbol suffix.
120+
Hence to obtain the final symbol names, we need to take into account which
121+
Fortran compiler we are using. For the most common cases (e.g., gfortran, Intel
122+
Fortran, or Flang), that means appending a single underscore. In that case, the
123+
result is:
124+
125+
| base API name | binary symbol name | call from Fortran code | call from C code |
126+
|---------------|--------------------|------------------------|-----------------------|
127+
| `dgemm` | `dgemm_64_` | `dgemm_64(...)` | `dgemm_64_(...)` |
128+
| `cblas_dgemm` | `cblas_dgemm64_` | n/a | `cblas_dgemm64_(...)` |
129+
130+
It is quite useful to have these symbol names be as uniform as possible across
131+
different packaging systems.
132+
133+
The equivalent build options with CMake are:
134+
```bash
135+
$ mkdir build && cd build
136+
$ cmake .. -DINTERFACE64=1 -DSYMBOLSUFFIX=64_ -DBUILD_SHARED_LIBS=ON -DBUILD_STATIC_LIBS=ON
137+
$ cmake --build . -j
138+
```
139+
140+
Note that the result is not 100% identical to the Make result. For example, the
141+
library name ends in `_64` rather than `64_` - it is recommended to rename them
142+
to match the Make library names (also update the `libsuffix` entry in
143+
`openblas64.pc` to match that rename).
144+
```bash
145+
$ cmake --install . --prefix $PWD/../../openblas/cmake64
146+
$ tree .
147+
.
148+
├── include
149+
│   └── openblas64
150+
│   ├── cblas.h
151+
│   ├── f77blas.h
152+
│   ├── lapacke_config.h
153+
│   ├── lapacke_example_aux.h
154+
│   ├── lapacke.h
155+
│   ├── lapacke_mangling.h
156+
│   ├── lapacke_utils.h
157+
│   ├── lapack.h
158+
│   ├── openblas64
159+
│   │   └── lapacke_mangling.h
160+
│   └── openblas_config.h
161+
└── lib
162+
├── cmake
163+
│   └── OpenBLAS64
164+
│   ├── OpenBLAS64Config.cmake
165+
│   ├── OpenBLAS64ConfigVersion.cmake
166+
│   ├── OpenBLAS64Targets.cmake
167+
│   └── OpenBLAS64Targets-noconfig.cmake
168+
├── libopenblas_64.a
169+
├── libopenblas_64.so -> libopenblas_64.so.0
170+
└── pkgconfig
171+
└── openblas64.pc
172+
```
173+
174+
175+
### The upcoming standardized ILP64 convention
176+
177+
While the `64_` convention above got some adoption, it's slightly hacky and is
178+
implemented through the use of `objcopy`. An effort is ongoing for a more
179+
broadly adopted convention in the reference BLAS and LAPACK libraries, using
180+
(a) the `_64` suffix, and (b) applying that suffix _before_ rather than after
181+
Fortran compiler mangling. The central issue for this is
182+
[lapack#666](https://github.com/Reference-LAPACK/lapack/issues/666).
183+
184+
For the most common cases of compiler mangling (a single `_` appended), the end
185+
result will be:
186+
187+
| base API name | binary symbol name | call from Fortran code | call from C code |
188+
|---------------|--------------------|------------------------|-----------------------|
189+
| `dgemm` | `dgemm_64_` | `dgemm_64(...)` | `dgemm_64_(...)` |
190+
| `cblas_dgemm` | `cblas_dgemm_64` | n/a | `cblas_dgemm_64(...)` |
191+
192+
For other compiler mangling schemes, replace the trailing `_` by the scheme in use.
193+
194+
The shared library name for this `_64` convention should be `libopenblas_64.so`.
195+
196+
Note: it is not yet possible to produce an OpenBLAS build which employs this
197+
convention! Once reference BLAS and LAPACK with support for `_64` have been
198+
released, a future OpenBLAS release will support it. For now, please use the
199+
older `64_` scheme and avoid using the name `libopenblas_64.so`; it should be
200+
considered reserved for future use of the `_64` standard as prescribed by
201+
reference BLAS/LAPACK.
202+
203+
204+
## Performance and runtime behavior related build options
205+
206+
For these options there are multiple reasonable or common choices.
207+
208+
### Threading related options
209+
210+
OpenBLAS can be built as a multi-threaded or single-threaded library, with the
211+
default being multi-threaded. It's expected that the default `libopenblas`
212+
library is multi-threaded; if you'd like to also distribute single-threaded
213+
builds, consider naming them `libopenblas_sequential`.
214+
215+
OpenBLAS can be built with pthreads or OpenMP as the threading model, with the
216+
default being pthreads. Both options are commonly used, and the choice here
217+
should not influence the shared library name. The choice will be captured by
218+
the `.pc` file. E.g.,:
219+
```bash
220+
$ pkg-config --libs openblas
221+
-fopenmp -lopenblas
222+
223+
$ cat openblas.pc
224+
...
225+
openblas_config= ... USE_OPENMP=0 MAX_THREADS=24
226+
```
227+
228+
The maximum number of threads users will be able to use is determined at build
229+
time by the `NUM_THREADS` build option. It defaults to 24, and there's a wide
230+
range of values that are reasonable to use (up to 256). 64 is a typical choice
231+
here; there is a memory footprint penalty that is linear in `NUM_THREADS`.
232+
Please see `Makefile.rule` for more details.
233+
234+
### CPU architecture related options
235+
236+
OpenBLAS contains a lot of CPU architecture-specific optimizations, hence when
237+
distributing to a user base with a variety of hardware, it is recommended to
238+
enable CPU architecture runtime detection. This will dynamically select
239+
optimized kernels for individual APIs. To do this, use the `DYNAMIC_ARCH=1`
240+
build option. This is usually done on all common CPU families, except when
241+
there are known issues.
242+
243+
In case the CPU architecture is known (e.g. you're building binaries for macOS
244+
M1 users), it is possible to specify the target architecture directly with the
245+
`TARGET=` build option.
246+
247+
`DYNAMIC_ARCH` and `TARGET` are covered in more detail in the main `README.md`
248+
in this repository.
249+
250+
251+
## Real-world examples
252+
253+
OpenBLAS is likely to be distributed in one of these distribution models:
254+
255+
1. As a standalone package, or multiple packages, in a packaging ecosystem like
256+
a Linux distro, Homebrew, conda-forge or MSYS2.
257+
2. Vendored as part of a larger package, e.g. in Julia, NumPy, SciPy, or R.
258+
3. Locally, e.g. making available as a build on a single HPC cluster.
259+
260+
The guidance on this page is most important for models (1) and (2). These links
261+
to build recipes for a representative selection of packaging systems may be
262+
helpful as a reference:
263+
264+
- [Fedora](https://src.fedoraproject.org/rpms/openblas/blob/rawhide/f/openblas.spec)
265+
- [Debian](https://salsa.debian.org/science-team/openblas/-/blob/master/debian/rules)
266+
- [Homebrew](https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/openblas.rb)
267+
- [MSYS2](https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-openblas/PKGBUILD)
268+
- [conda-forge](https://github.com/conda-forge/openblas-feedstock/blob/main/recipe/build.sh)
269+
- [NumPy/SciPy](https://github.com/MacPython/openblas-libs/blob/main/tools/build_openblas.sh)
270+
- [Nixpkgs](https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/libraries/science/math/openblas/default.nix)

0 commit comments

Comments
 (0)