docs: improve extensions page

rgommers · rgommers · commit ca9a0c28e8c6 · 2024-06-30T18:04:12.000+02:00
diff --git a/docs/extensions.md b/docs/extensions.md
@@ -1,4 +1,9 @@
-* BLAS-like extensions
+OpenBLAS for the most part contains implementations of the reference (Netlib)
+BLAS, CBLAS, LAPACK and LAPACKE interfaces. A few OpenBLAS-specific functions
+are also provided however, which mostly can be seen as "BLAS extensions".
+This page documents those non-standard APIs.
+
+## BLAS-like extensions
 
 | Routine       | Data Types    | Description     |
 | ------------- |:------------- | :---------------|
@@ -9,20 +14,26 @@
 | ?geadd        | s,d,c,z       | matrix add   |
 | ?gemmt        | s,d,c,z       | gemm but only a triangular part updated|
 
-* BLAS-like and Conversion functions for bfloat16 (available when OpenBLAS was compiled with BUILD_BFLOAT16=1)
-  * `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding
-  * `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding
-  * `void cblas_sbf16tos` converts a bfloat16 array to an array of floats
-  * `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles
-  * `float cblas_sbdot` computes the dot product of two bfloat16 arrays
-  * `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16  
-  * `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16
-
-* Utility functions
-  * openblas_get_num_threads
-  * openblas_set_num_threads
-  * `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores")
-  * `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading
-  * `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell`
-  * `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the cpu affinity mask of the given thread to the provided cpuset. (Only available under Linux, with semantics identical to pthread_setaffinity_np)
+
+## bfloat16 functionality
+
+BLAS-like and conversion functions for `bfloat16` (available when OpenBLAS was compiled with `BUILD_BFLOAT16=1`):
+
+* `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding
+* `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding
+* `void cblas_sbf16tos` converts a bfloat16 array to an array of floats
+* `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles
+* `float cblas_sbdot` computes the dot product of two bfloat16 arrays
+* `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16
+* `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16
+
+## Utility functions
+
+* `openblas_get_num_threads`
+* `openblas_set_num_threads`
+* `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores")
+* `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading
+* `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell`
+* `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the CPU affinity mask of the given thread
+  to the provided cpuset. Only available on Linux, with semantics identical to `pthread_setaffinity_np`.