|
1 |
| -* BLAS-like extensions |
| 1 | +OpenBLAS for the most part contains implementations of the reference (Netlib) |
| 2 | +BLAS, CBLAS, LAPACK and LAPACKE interfaces. A few OpenBLAS-specific functions |
| 3 | +are also provided however, which mostly can be seen as "BLAS extensions". |
| 4 | +This page documents those non-standard APIs. |
| 5 | + |
| 6 | +## BLAS-like extensions |
2 | 7 |
|
3 | 8 | | Routine | Data Types | Description |
|
4 | 9 | | ------------- |:------------- | :---------------|
|
|
9 | 14 | | ?geadd | s,d,c,z | matrix add |
|
10 | 15 | | ?gemmt | s,d,c,z | gemm but only a triangular part updated|
|
11 | 16 |
|
12 |
| -* BLAS-like and Conversion functions for bfloat16 (available when OpenBLAS was compiled with BUILD_BFLOAT16=1) |
13 |
| - * `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding |
14 |
| - * `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding |
15 |
| - * `void cblas_sbf16tos` converts a bfloat16 array to an array of floats |
16 |
| - * `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles |
17 |
| - * `float cblas_sbdot` computes the dot product of two bfloat16 arrays |
18 |
| - * `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16 |
19 |
| - * `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16 |
20 |
| - |
21 |
| -* Utility functions |
22 |
| - * openblas_get_num_threads |
23 |
| - * openblas_set_num_threads |
24 |
| - * `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores") |
25 |
| - * `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading |
26 |
| - * `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell` |
27 |
| - * `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the cpu affinity mask of the given thread to the provided cpuset. (Only available under Linux, with semantics identical to pthread_setaffinity_np) |
| 17 | + |
| 18 | +## bfloat16 functionality |
| 19 | + |
| 20 | +BLAS-like and conversion functions for `bfloat16` (available when OpenBLAS was compiled with `BUILD_BFLOAT16=1`): |
| 21 | + |
| 22 | +* `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding |
| 23 | +* `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding |
| 24 | +* `void cblas_sbf16tos` converts a bfloat16 array to an array of floats |
| 25 | +* `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles |
| 26 | +* `float cblas_sbdot` computes the dot product of two bfloat16 arrays |
| 27 | +* `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16 |
| 28 | +* `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16 |
| 29 | + |
| 30 | +## Utility functions |
| 31 | + |
| 32 | +* `openblas_get_num_threads` |
| 33 | +* `openblas_set_num_threads` |
| 34 | +* `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores") |
| 35 | +* `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading |
| 36 | +* `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell` |
| 37 | +* `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the CPU affinity mask of the given thread |
| 38 | + to the provided cpuset. Only available on Linux, with semantics identical to `pthread_setaffinity_np`. |
28 | 39 |
|
0 commit comments