Skip to content

Commit c1b9bb8

Browse files
committed
docs: improvements to the User Manual
1 parent 237c2c4 commit c1b9bb8

File tree

1 file changed

+166
-40
lines changed

1 file changed

+166
-40
lines changed

docs/user_manual.md

Lines changed: 166 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,174 @@
1-
## Compile the library
1+
2+
This user manual covers compiling OpenBLAS itself, linking your code to OpenBLAS,
3+
example code to use the C (CBLAS) and Fortran (BLAS) APIs, and some troubleshooting
4+
tips. Compiling OpenBLAS is optional, since you may be able to install with a
5+
package manager.
6+
7+
!!! Note BLAS API reference documentation
8+
9+
The OpenBLAS documentation does not contain API reference documentation for
10+
BLAS or LAPACK, since these are standardized APIs, the documentation for
11+
which can be found in other places. If you want to understand every BLAS
12+
function and definition, we recommend reading the
13+
[Intel MKL reference manual](https://software.intel.com/en-us/intel-mkl/documentation)
14+
or the [Netlib BLAS documentation](http://netlib.org/blas/).
15+
16+
OpenBLAS does contain a limited number of functions that are non-standard,
17+
these are documented at [OpenBLAS extension functions](extensions.md).
18+
19+
20+
## Compiling OpenBLAS
21+
222
### Normal compile
3-
* type `make` to detect the CPU automatically.
4-
or
5-
* type `make TARGET=xxx` to set target CPU, e.g. `make TARGET=NEHALEM`. The full target list is in file TargetList.txt.
623

7-
### Cross compile
8-
Please set `CC` and `FC` with the cross toolchains. Then, set `HOSTCC` with your host C compiler. At last, set `TARGET` explicitly.
24+
The default way to build and install OpenBLAS from source is with Make:
25+
```
26+
make # add `-j4` to compile in parallel with 4 processes
27+
make install
28+
```
929

10-
Examples:
30+
By default, the CPU architecture is detected automatically when invoking
31+
`make`, and the build is optimized for the detected CPU. To override the
32+
autodetection, use the `TARGET` flag:
1133

12-
* On x86 box, compile the library for ARM Cortex-A9 linux.
34+
```
35+
# `make TARGET=xxx` sets target CPU: e.g. for an Intel Nehalem CPU:
36+
make TARGET=NEHALEM
37+
```
38+
The full list of known target CPU architectures can be found in
39+
`TargetList.txt` in the root of the repository.
1340

14-
Install only gnueabihf versions. Please check https://github.com/xianyi/OpenBLAS/issues/936#issuecomment-237596847
41+
### Cross compile
1542

16-
make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9
43+
For a basic cross-compilation with Make, three steps need to be taken:
1744

18-
* On X86 box, compile this library for loongson3a CPU.
45+
- Set the `CC` and `FC` environment variables to select the cross toolchains
46+
for C and Fortran.
47+
- Set the `HOSTCC` environment variable to select the host C compiler (i.e. the
48+
regular C compiler for the machine on which you are invoking the build).
49+
- Set `TARGET` explicitly to the CPU architecture on which the produced
50+
OpenBLAS binaries will be used.
1951

52+
#### Cross-compilation examples
53+
54+
Compile the library for ARM Cortex-A9 linux on an x86-64 machine
55+
_(note: install only `gnueabihf` versions of the cross toolchain - see
56+
[this issue comment](https://github.com/OpenMathLib/OpenBLAS/issues/936#issuecomment-237596847)
57+
for why_):
2058
```
21-
make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
59+
make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9
2260
```
2361

24-
* On X86 box, compile this library for loongson3a CPU with loongcc (based on Open64) compiler.
62+
Compile OpenBLAS for a loongson3a CPU on an x86-64 machine:
63+
```
64+
make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
65+
```
2566

67+
Compile OpenBLAS for loongson3a CPU with the `loongcc` (based on Open64) compiler on an x86-64 machine:
2668
```
2769
make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu- NO_LAPACKE=1 NO_SHARED=1 BINARY=32
2870
```
2971

30-
### Debug version
72+
### Building a debug version
3173

32-
make DEBUG=1
74+
Add `DEBUG=1` to your build command, e.g.:
75+
```
76+
make DEBUG=1
77+
```
3378

34-
### Install to the directory (optional)
79+
### Install to a specific directory
3580

36-
Example:
81+
!!! note
3782

38-
make install PREFIX=your_installation_directory
83+
Installing to a directory is optional; it is also possible to use the shared or static
84+
libraries directly from the build directory.
3985

40-
The default directory is /opt/OpenBLAS. Note that any flags passed to `make` during build should also be passed to `make install` to circumvent any install errors, i.e. some headers not being copied over correctly.
86+
Use `make install` with the `PREFIX` flag to install to a specific directory:
4187

42-
For more information, please read [Installation Guide](install.md).
88+
```
89+
make install PREFIX=/path/to/installation/directory
90+
```
91+
92+
The default directory is `/opt/OpenBLAS`.
93+
94+
!!! important
95+
96+
Note that any flags passed to `make` during build should also be passed to
97+
`make install` to circumvent any install errors, i.e. some headers not
98+
being copied over correctly.
4399

44-
## Link the library
100+
For more detailed information on building/installing from source, please read
101+
the [Installation Guide](install.md).
45102

46-
* Link shared library
47103

104+
## Linking to OpenBLAS
105+
106+
OpenBLAS can be used as a shared or a static library.
107+
108+
### Link a shared library
109+
110+
The shared library is normally called `libopenblas.so`, but not that the name
111+
may be different as a result of build flags used or naming choices by a distro
112+
packager (see [distributing.md] for details). To link a shared library named
113+
`libopenblas.so`, the flag `-lopenblas` is needed. To find the OpenBLAS headers,
114+
a `-I/path/to/includedir` is needed. And unless the library is installed in a
115+
directory that the linker searches by default, also `-L` and `-Wl,-rpath` flags
116+
are needed. For a source file `test.c` (e.g., the example code under _Call
117+
CBLAS interface_ further down), the shared library can then be linked with:
48118
```
49119
gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas
50120
```
51121

52-
The `-Wl,-rpath,/your_path/OpenBLAS/lib` option to linker can be omitted if you ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in `/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a location that is part of the `ld.so` default search path (usually /lib,/usr/lib and /usr/local/lib). Alternatively, you can set the environment variable LD_LIBRARY_PATH to point to the folder that contains libopenblas.so. Otherwise, linking at runtime will fail with a message like `cannot open shared object file: no such file or directory`
122+
The `-Wl,-rpath,/your_path/OpenBLAS/lib` linker flag can be omitted if you
123+
ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in
124+
`/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a
125+
location that is part of the `ld.so` default search path (usually `/lib`,
126+
`/usr/lib` and `/usr/local/lib`). Alternatively, you can set the environment
127+
variable `LD_LIBRARY_PATH` to point to the folder that contains `libopenblas.so`.
128+
Otherwise, the build may succeed but at runtime loading the library will fail
129+
with a message like:
130+
```
131+
cannot open shared object file: no such file or directory
132+
```
53133

54-
If the library is multithreaded, please add `-lpthread`. If the library contains LAPACK functions, please add `-lgfortran` or other Fortran libs, although if you only make calls to LAPACKE routines, i.e. your code has `#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`, `-lgfortran` is not needed.
134+
More flags may be needed, depending on how OpenBLAS was built:
55135

56-
* Link static library
136+
- If `libopenblas` is multi-threaded, please add `-lpthread`.
137+
- If the library contains LAPACK functions (usually also true), please add
138+
`-lgfortran` (other Fortran libraries may also be needed, e.g. `-lquadmath`).
139+
Note that if you only make calls to LAPACKE routines, i.e. your code has
140+
`#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`,
141+
then `-lgfortran` is not needed.
57142

143+
!!! tip Use pkg-config
144+
145+
Usually a pkg-config file (e.g., `openblas.pc`) is installed together
146+
with a `libopenblas` shared library. pkg-config is a tool that will
147+
tell you the exact flags needed for linking. For example:
148+
149+
```
150+
$ pkg-config --cflags openblas
151+
-I/usr/local/include
152+
$ pkg-config --libs openblas
153+
-L/usr/local/lib -lopenblas
154+
```
155+
156+
### Link a static library
157+
158+
Linking a static library is simpler - add the path to the static OpenBLAS
159+
library to the compile command:
58160
```
59161
gcc -o test test.c /your/path/libopenblas.a
60162
```
61163

62-
You can download `test.c` from https://gist.github.com/xianyi/5780018
63164

64165
## Code examples
65166

66167
### Call CBLAS interface
67-
This example shows calling cblas_dgemm in C. https://gist.github.com/xianyi/6930656
168+
169+
This example shows calling `cblas_dgemm` in C:
170+
171+
<!-- Source: https://gist.github.com/xianyi/6930656 -->
68172
```c
69173
#include <cblas.h>
70174
#include <stdio.h>
@@ -83,14 +187,17 @@ void main()
83187
}
84188
```
85189

190+
To compile this file, save it as `test_cblas_dgemm.c` and then run:
86191
```
87-
gcc -o test_cblas_open test_cblas_dgemm.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran
192+
gcc -o test_cblas_open test_cblas_dgemm.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran
88193
```
194+
will result in a `test_cblas_open` executable.
89195

90196
### Call BLAS Fortran interface
91197

92-
This example shows calling dgemm Fortran interface in C. https://gist.github.com/xianyi/5780018
198+
This example shows calling the `dgemm` Fortran interface in C:
93199

200+
<!-- Source: https://gist.github.com/xianyi/5780018 -->
94201
```c
95202
#include "stdio.h"
96203
#include "stdlib.h"
@@ -158,22 +265,41 @@ int main(int argc, char* argv[])
158265
}
159266
```
160267
268+
To compile this file, save it as `time_dgemm.c` and then run:
161269
```
162270
gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a -lpthread
163-
./time_dgemm <m> <n> <k>
164271
```
272+
You can then run it as: `./time_dgemm <m> <n> <k>`, with `m`, `n`, and `k` input
273+
parameters to the `time_dgemm` executable.
165274
166-
## Troubleshooting
275+
!!! note
167276
168-
* Please read [Faq](faq.md) at first.
169-
* Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
170-
* Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
171-
* The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1.
172-
* OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html).
173-
* On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.
277+
When calling the Fortran interface from C, you have to deal with symbol name
278+
differences caused by compiler conventions. That is why the `dgemm_` function
279+
call in the example above has a trailing underscore. This is what it looks like
280+
when using `gcc`/`gfortran`, however such details may change for different
281+
compilers. Hence it requires extra support code. The CBLAS interface may be
282+
more portable when writing C code.
174283
175-
## BLAS reference manual
284+
When writing code that needs to be portable and work across different
285+
platforms and compilers, the above code example is not recommended for
286+
usage. Instead, we advise looking at how OpenBLAS (or BLAS in general, since
287+
this problem isn't specific to OpenBLAS) functions are called in widely
288+
used projects like Julia, SciPy, or R.
176289
177-
If you want to understand every BLAS function and definition, please read [Intel MKL reference manual](https://software.intel.com/en-us/intel-mkl/documentation) or [netlib.org](http://netlib.org/blas/)
178290
179-
Here are [OpenBLAS extension functions](extensions.md)
291+
## Troubleshooting
292+
293+
* Please read the [FAQ](faq.md) first, your problem may be described there.
294+
* Please ensure you are using a recent enough compiler, that supports the
295+
features your CPU provides (example: GCC versions before 4.6 were known to
296+
not support AVX kernels, and before 6.1 AVX512CD kernels).
297+
* The number of CPU cores supported by default is <=256. On Linux x86-64, there
298+
is experimental support for up to 1024 cores and 128 NUMA nodes if you build
299+
the library with `BIGNUMA=1`.
300+
* OpenBLAS does not set processor affinity by default. On Linux, you can enable
301+
processor affinity by commenting out the line `NO_AFFINITY=1` in
302+
`Makefile.rule`.
303+
* On Loongson 3A, `make test` is known to fail with a `pthread_create` error
304+
and an `EAGAIN` error code. However, it will be OK when you run the same
305+
testcase in a shell.

0 commit comments

Comments
 (0)