|
| 1 | +# Guidance for redistributing OpenBLAS |
| 2 | + |
| 3 | +*We note that this document contains recommendations only - packagers and other |
| 4 | +redistributors are in charge of how OpenBLAS is built and distributed in their |
| 5 | +systems, and may have good reasons to deviate from the guidance given on this |
| 6 | +page. These recommendations are aimed at general packaging systems, with a user |
| 7 | +base that typically is large, open source (or freely available at least), and |
| 8 | +doesn't behave uniformly or that the packager is directly connected with.* |
| 9 | + |
| 10 | +OpenBLAS has a large number of build-time options which can be used to change |
| 11 | +how it behaves at runtime, how artifacts or symbols are named, etc. Variation |
| 12 | +in build configuration can be necessary to acheive a given end goal within a |
| 13 | +distribution or as an end user. However, such variation can also make it more |
| 14 | +difficult to build on top of OpenBLAS and ship code or other packages in a way |
| 15 | +that works across many different distros. Here we provide guidance about the |
| 16 | +most important build options, what effects they may have when changed, and |
| 17 | +which ones to default to. |
| 18 | + |
| 19 | +The Make and CMake build systems provide equivalent options and yield more or |
| 20 | +less the same artifacts, but not exactly (the CMake builds are still |
| 21 | +experimental). You can choose either one and the options will function in the |
| 22 | +same way, however the CMake outputs may require some renaming. To review |
| 23 | +available build options, see `Makefile.rule` or `CMakeLists.txt` in the root of |
| 24 | +the repository. |
| 25 | + |
| 26 | +Build options typically fall into two categories: (a) options that affect the |
| 27 | +user interface, such as library and symbol names or APIs that are made |
| 28 | +available, and (b) options that affect performance and runtime behavior, such |
| 29 | +as threading behavior or CPU architecture-specific code paths. The user |
| 30 | +interface options are more important to keep aligned between distributions, |
| 31 | +while for the performance-related options there are typically more reasons to |
| 32 | +make choices that deviate from the defaults. |
| 33 | + |
| 34 | +Here are recommendations for user interface related packaging choices where it |
| 35 | +is not likely to be a good idea to deviate (typically these are the default |
| 36 | +settings): |
| 37 | + |
| 38 | +1. Include CBLAS. The CBLAS interface is widely used and it doesn't affect |
| 39 | + binary size much, so don't turn it off. |
| 40 | +2. Include LAPACK and LAPACKE. The LAPACK interface is also widely used, and |
| 41 | + while it does make up a significant part of the binary size of the installed |
| 42 | + library, that does not outweigh the regression in usability when deviating |
| 43 | + from the default here.[^1] |
| 44 | +3. Always distribute the pkg-config (`.pc`) and CMake `.cmake`) dependency |
| 45 | + detection files. These files are used by build systems when users want to |
| 46 | + link against OpenBLAS, and there is no benefit of leaving them out. |
| 47 | +4. Provide the LP64 interface by default, and if in addition to that you choose |
| 48 | + to provide an ILP64 interface build as well, use a symbol suffix to avoid |
| 49 | + symbol name clashes (see the next section). |
| 50 | + |
| 51 | +[^1] All major distributions do include LAPACK as of mid 2023 as far as we |
| 52 | +know. Older versions of Arch Linux did not, and that was known to cause |
| 53 | +problems. |
| 54 | + |
| 55 | + |
| 56 | +## ILP64 interface builds |
| 57 | + |
| 58 | +The LP64 (32-bit integer) interface is the default build, and has |
| 59 | +well-established C and Fortran APIs as determined by the reference (Netlib) |
| 60 | +BLAS and LAPACK libraries. The ILP64 (64-bit integer) interface however does |
| 61 | +not have a standard API: symbol names and shared/static library names can be |
| 62 | +produced in multiple ways, and this tends to make it difficult to use. |
| 63 | +As of today there is an agreed-upon way of choosing names for OpenBLAS between |
| 64 | +a number of key users/redistributors, which is the closest thing to a standard |
| 65 | +that there is now. However, there is an ongoing standardization effort in the |
| 66 | +reference BLAS and LAPACK libraries, which differs from the current OpenBLAS |
| 67 | +agreed-upon convention. In this section we'll aim to explain both. |
| 68 | + |
| 69 | +Those two methods are fairly similar, and have a key thing in common: *using a |
| 70 | +symbol suffix*. This is good practice; it is recommended that if you distribute |
| 71 | +an ILP64 build, to have it use a symbol suffix containing `64` in the name. |
| 72 | +This avoids potential symbol clashes when different packages which depend on |
| 73 | +OpenBLAS load both an LP64 and an ILP64 library into memory at the same time. |
| 74 | + |
| 75 | +### The current OpenBLAS agreed-upon ILP64 convention |
| 76 | + |
| 77 | +This convention comprises the shared library name and the symbol suffix in the |
| 78 | +shared library. The symbol suffix to use is `64_`, implying that the library |
| 79 | +name will be `libopenblas64_.so` and the symbols in that library end in `64_`. |
| 80 | +The central issue where this was discussed is |
| 81 | +[openblas#646](https://github.com/xianyi/OpenBLAS/issues/646), and adopters |
| 82 | +include Fedora, Julia, NumPy and SciPy - SuiteSparse already used it as well. |
| 83 | + |
| 84 | +To build shared and static libraries with the currently recommended ILP64 |
| 85 | +conventions with Make: |
| 86 | +```bash |
| 87 | +$ make INTERFACE64=1 SYMBOLSUFFIX=64_ |
| 88 | +``` |
| 89 | + |
| 90 | +This will produce libraries named `libopenblas64_.so|a`, a pkg-config file |
| 91 | +named `openblas64.pc`, and CMake and header files. |
| 92 | + |
| 93 | +Installing locally and inspecting the output will show a few more details: |
| 94 | +```bash |
| 95 | +$ make install PREFIX=$PWD/../openblas/make64 INTERFACE64=1 SYMBOLSUFFIX=64_ |
| 96 | +$ tree . # output slightly edited down |
| 97 | +. |
| 98 | +├── include |
| 99 | +│ ├── cblas.h |
| 100 | +│ ├── f77blas.h |
| 101 | +│ ├── lapacke_config.h |
| 102 | +│ ├── lapacke.h |
| 103 | +│ ├── lapacke_mangling.h |
| 104 | +│ ├── lapacke_utils.h |
| 105 | +│ ├── lapack.h |
| 106 | +│ └── openblas_config.h |
| 107 | +└── lib |
| 108 | + ├── cmake |
| 109 | + │ └── openblas |
| 110 | + │ ├── OpenBLASConfig.cmake |
| 111 | + │ └── OpenBLASConfigVersion.cmake |
| 112 | + ├── libopenblas64_.a |
| 113 | + ├── libopenblas64_.so |
| 114 | + └── pkgconfig |
| 115 | + └── openblas64.pc |
| 116 | +``` |
| 117 | + |
| 118 | +A key point are the symbol names. These will equal the LP64 symbol names, then |
| 119 | +(for Fortran only) the compiler mangling, and then the `64_` symbol suffix. |
| 120 | +Hence to obtain the final symbol names, we need to take into account which |
| 121 | +Fortran compiler we are using. For the most common cases (e.g., gfortran, Intel |
| 122 | +Fortran, or Flang), that means appending a single underscore. In that case, the |
| 123 | +result is: |
| 124 | + |
| 125 | +| base API name | binary symbol name | call from Fortran code | call from C code | |
| 126 | +|---------------|--------------------|------------------------|-----------------------| |
| 127 | +| `dgemm` | `dgemm_64_` | `dgemm_64(...)` | `dgemm_64_(...)` | |
| 128 | +| `cblas_dgemm` | `cblas_dgemm64_` | n/a | `cblas_dgemm64_(...)` | |
| 129 | + |
| 130 | +It is quite useful to have these symbol names be as uniform as possible across |
| 131 | +different packaging systems. |
| 132 | + |
| 133 | +The equivalent build options with CMake are: |
| 134 | +```bash |
| 135 | +$ mkdir build && cd build |
| 136 | +$ cmake .. -DINTERFACE64=1 -DSYMBOLSUFFIX=64_ -DBUILD_SHARED_LIBS=ON -DBUILD_STATIC_LIBS=ON |
| 137 | +$ cmake --build . -j |
| 138 | +``` |
| 139 | + |
| 140 | +Note that the result is not 100% identical to the Make result. For example, the |
| 141 | +library name ends in `_64` rather than `64_` - it is recommended to rename them |
| 142 | +to match the Make library names (also update the `libsuffix` entry in |
| 143 | +`openblas64.pc` to match that rename). |
| 144 | +```bash |
| 145 | +$ cmake --install . --prefix $PWD/../../openblas/cmake64 |
| 146 | +$ tree . |
| 147 | +. |
| 148 | +├── include |
| 149 | +│ └── openblas64 |
| 150 | +│ ├── cblas.h |
| 151 | +│ ├── f77blas.h |
| 152 | +│ ├── lapacke_config.h |
| 153 | +│ ├── lapacke_example_aux.h |
| 154 | +│ ├── lapacke.h |
| 155 | +│ ├── lapacke_mangling.h |
| 156 | +│ ├── lapacke_utils.h |
| 157 | +│ ├── lapack.h |
| 158 | +│ ├── openblas64 |
| 159 | +│ │ └── lapacke_mangling.h |
| 160 | +│ └── openblas_config.h |
| 161 | +└── lib |
| 162 | + ├── cmake |
| 163 | + │ └── OpenBLAS64 |
| 164 | + │ ├── OpenBLAS64Config.cmake |
| 165 | + │ ├── OpenBLAS64ConfigVersion.cmake |
| 166 | + │ ├── OpenBLAS64Targets.cmake |
| 167 | + │ └── OpenBLAS64Targets-noconfig.cmake |
| 168 | + ├── libopenblas_64.a |
| 169 | + ├── libopenblas_64.so -> libopenblas_64.so.0 |
| 170 | + └── pkgconfig |
| 171 | + └── openblas64.pc |
| 172 | +``` |
| 173 | + |
| 174 | + |
| 175 | +### The upcoming standardized ILP64 convention |
| 176 | + |
| 177 | +While the `64_` convention above got some adoption, it's slightly hacky and is |
| 178 | +implemented through the use of `objcopy`. An effort is ongoing for a more |
| 179 | +broadly adopted convention in the reference BLAS and LAPACK libraries, using |
| 180 | +(a) the `_64` suffix, and (b) applying that suffix _before_ rather than after |
| 181 | +Fortran compiler mangling. The central issue for this is |
| 182 | +[lapack#666](https://github.com/Reference-LAPACK/lapack/issues/666). |
| 183 | + |
| 184 | +For the most common cases of compiler mangling (a single `_` appended), the end |
| 185 | +result will be: |
| 186 | + |
| 187 | +| base API name | binary symbol name | call from Fortran code | call from C code | |
| 188 | +|---------------|--------------------|------------------------|-----------------------| |
| 189 | +| `dgemm` | `dgemm_64_` | `dgemm_64(...)` | `dgemm_64_(...)` | |
| 190 | +| `cblas_dgemm` | `cblas_dgemm_64` | n/a | `cblas_dgemm_64(...)` | |
| 191 | + |
| 192 | +For other compiler mangling schemes, replace the trailing `_` by the scheme in use. |
| 193 | + |
| 194 | +The shared library name for this `_64` convention should be `libopenblas_64.so`. |
| 195 | + |
| 196 | +Note: it is not yet possible to produce an OpenBLAS build which employs this |
| 197 | +convention! Once reference BLAS and LAPACK with support for `_64` have been |
| 198 | +released, a future OpenBLAS release will support it. For now, please use the |
| 199 | +older `64_` scheme and avoid using the name `libopenblas_64.so`; it should be |
| 200 | +considered reserved for future use of the `_64` standard as prescribed by |
| 201 | +reference BLAS/LAPACK. |
| 202 | + |
| 203 | + |
| 204 | +## Performance and runtime behavior related build options |
| 205 | + |
| 206 | +For these options there are multiple reasonable or common choices. |
| 207 | + |
| 208 | +### Threading related options |
| 209 | + |
| 210 | +OpenBLAS can be built as a multi-threaded or single-threaded library, with the |
| 211 | +default being multi-threaded. It's expected that the default `libopenblas` |
| 212 | +library is multi-threaded; if you'd like to also distribute single-threaded |
| 213 | +builds, consider naming them `libopenblas_sequential`. |
| 214 | + |
| 215 | +OpenBLAS can be built with pthreads or OpenMP as the threading model, with the |
| 216 | +default being pthreads. Both options are commonly used, and the choice here |
| 217 | +should not influence the shared library name. The choice will be captured by |
| 218 | +the `.pc` file. E.g.,: |
| 219 | +```bash |
| 220 | +$ pkg-config --libs openblas |
| 221 | +-fopenmp -lopenblas |
| 222 | + |
| 223 | +$ cat openblas.pc |
| 224 | +... |
| 225 | +openblas_config= ... USE_OPENMP=0 MAX_THREADS=24 |
| 226 | +``` |
| 227 | + |
| 228 | +The maximum number of threads users will be able to use is determined at build |
| 229 | +time by the `NUM_THREADS` build option. It defaults to 24, and there's a wide |
| 230 | +range of values that are reasonable to use (up to 256). 64 is a typical choice |
| 231 | +here; there is a memory footprint penalty that is linear in `NUM_THREADS`. |
| 232 | +Please see `Makefile.rule` for more details. |
| 233 | + |
| 234 | +### CPU architecture related options |
| 235 | + |
| 236 | +OpenBLAS contains a lot of CPU architecture-specific optimizations, hence when |
| 237 | +distributing to a user base with a variety of hardware, it is recommended to |
| 238 | +enable CPU architecture runtime detection. This will dynamically select |
| 239 | +optimized kernels for individual APIs. To do this, use the `DYNAMIC_ARCH=1` |
| 240 | +build option. This is usually done on all common CPU families, except when |
| 241 | +there are known issues. |
| 242 | + |
| 243 | +In case the CPU architecture is known (e.g. you're building binaries for macOS |
| 244 | +M1 users), it is possible to specify the target architecture directly with the |
| 245 | +`TARGET=` build option. |
| 246 | + |
| 247 | +`DYNAMIC_ARCH` and `TARGET` are covered in more detail in the main `README.md` |
| 248 | +in this repository. |
| 249 | + |
| 250 | + |
| 251 | +## Real-world examples |
| 252 | + |
| 253 | +OpenBLAS is likely to be distributed in one of these distribution models: |
| 254 | + |
| 255 | +1. As a standalone package, or multiple packages, in a packaging ecosystem like |
| 256 | + a Linux distro, Homebrew, conda-forge or MSYS2. |
| 257 | +2. Vendored as part of a larger package, e.g. in Julia, NumPy, SciPy, or R. |
| 258 | +3. Locally, e.g. making available as a build on a single HPC cluster. |
| 259 | + |
| 260 | +The guidance on this page is most important for models (1) and (2). These links |
| 261 | +to build recipes for a representative selection of packaging systems may be |
| 262 | +helpful as a reference: |
| 263 | + |
| 264 | +- [Fedora](https://src.fedoraproject.org/rpms/openblas/blob/rawhide/f/openblas.spec) |
| 265 | +- [Debian](https://salsa.debian.org/science-team/openblas/-/blob/master/debian/rules) |
| 266 | +- [Homebrew](https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/openblas.rb) |
| 267 | +- [MSYS2](https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-openblas/PKGBUILD) |
| 268 | +- [conda-forge](https://github.com/conda-forge/openblas-feedstock/blob/main/recipe/build.sh) |
| 269 | +- [NumPy/SciPy](https://github.com/MacPython/openblas-libs/blob/main/tools/build_openblas.sh) |
| 270 | +- [Nixpkgs](https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/libraries/science/math/openblas/default.nix) |
0 commit comments