Skip to content

Commit ca9a0c2

Browse files
committed
docs: improve extensions page
1 parent 3eba16c commit ca9a0c2

File tree

1 file changed

+28
-17
lines changed

1 file changed

+28
-17
lines changed

docs/extensions.md

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
1-
* BLAS-like extensions
1+
OpenBLAS for the most part contains implementations of the reference (Netlib)
2+
BLAS, CBLAS, LAPACK and LAPACKE interfaces. A few OpenBLAS-specific functions
3+
are also provided however, which mostly can be seen as "BLAS extensions".
4+
This page documents those non-standard APIs.
5+
6+
## BLAS-like extensions
27

38
| Routine | Data Types | Description |
49
| ------------- |:------------- | :---------------|
@@ -9,20 +14,26 @@
914
| ?geadd | s,d,c,z | matrix add |
1015
| ?gemmt | s,d,c,z | gemm but only a triangular part updated|
1116

12-
* BLAS-like and Conversion functions for bfloat16 (available when OpenBLAS was compiled with BUILD_BFLOAT16=1)
13-
* `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding
14-
* `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding
15-
* `void cblas_sbf16tos` converts a bfloat16 array to an array of floats
16-
* `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles
17-
* `float cblas_sbdot` computes the dot product of two bfloat16 arrays
18-
* `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16
19-
* `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16
20-
21-
* Utility functions
22-
* openblas_get_num_threads
23-
* openblas_set_num_threads
24-
* `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores")
25-
* `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading
26-
* `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell`
27-
* `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the cpu affinity mask of the given thread to the provided cpuset. (Only available under Linux, with semantics identical to pthread_setaffinity_np)
17+
18+
## bfloat16 functionality
19+
20+
BLAS-like and conversion functions for `bfloat16` (available when OpenBLAS was compiled with `BUILD_BFLOAT16=1`):
21+
22+
* `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding
23+
* `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding
24+
* `void cblas_sbf16tos` converts a bfloat16 array to an array of floats
25+
* `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles
26+
* `float cblas_sbdot` computes the dot product of two bfloat16 arrays
27+
* `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16
28+
* `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16
29+
30+
## Utility functions
31+
32+
* `openblas_get_num_threads`
33+
* `openblas_set_num_threads`
34+
* `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores")
35+
* `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading
36+
* `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell`
37+
* `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the CPU affinity mask of the given thread
38+
to the provided cpuset. Only available on Linux, with semantics identical to `pthread_setaffinity_np`.
2839

0 commit comments

Comments
 (0)