Skip to content

Commit d8e6e2b

Browse files
committed
Merge branch 'develop' into dynamicDispatchAIXandClang
2 parents badfb2e + 0de786c commit d8e6e2b

File tree

3 files changed

+30
-8
lines changed

3 files changed

+30
-8
lines changed

.cirrus.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,16 @@ FreeBSD_task:
148148
- ls -l /usr/local/lib
149149
- gmake CC=gcc INTERFACE64=1
150150

151+
FreeBSD_task:
152+
name: FreeBSD-clang-openmp
153+
freebsd_instance:
154+
image_family: freebsd-13-2
155+
install_script:
156+
- pkg update -f && pkg upgrade -y && pkg install -y gmake gcc
157+
- ln -s /usr/local/lib/gcc12/libgfortran.so.5.0.0 /usr/lib/libgfortran.so
158+
compile_script:
159+
- gmake CC=clang FC=gfortran USE_OPENMP=1 CPP_THREAD_SAFETY_TEST=1
160+
151161
#task:
152162
# name: Windows/LLVM16 --- too slow ---
153163
# windows_container:

README.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,15 @@ Building OpenBLAS requires the following to be installed:
5454

5555
Simply invoking `make` (or `gmake` on BSD) will detect the CPU automatically.
5656
To set a specific target CPU, use `make TARGET=xxx`, e.g. `make TARGET=NEHALEM`.
57-
The full target list is in the file `TargetList.txt`. For building with `cmake`, the
58-
usual conventions apply, i.e. create a build directory either underneath the toplevel
59-
OpenBLAS source directory or separate from it, and invoke `cmake` there with the path
60-
to the source tree and any build options you plan to set.
57+
The full target list is in the file `TargetList.txt`, other build optionss are documented in Makefile.rule and
58+
can either be set there (typically by removing the comment character from the respective line), or used on the
59+
`make` command line.
60+
Note that when you run `make install` after building, you need to repeat all command line options you provided to `make`
61+
in the build step, as some settings like the supported maximum number of threads are automatically derived from the
62+
build host by default, which might not be what you want.
63+
For building with `cmake`, the usual conventions apply, i.e. create a build directory either underneath the toplevel
64+
OpenBLAS source directory or separate from it, and invoke `cmake` there with the path to the source tree and any
65+
build options you plan to set.
6166

6267
### Cross compile
6368

@@ -117,7 +122,7 @@ Use `PREFIX=` when invoking `make`, for example
117122
```sh
118123
make install PREFIX=your_installation_directory
119124
```
120-
125+
(along with all options you added on the `make` command line in the preceding build step)
121126
The default installation directory is `/opt/OpenBLAS`.
122127

123128
## Supported CPUs and Operating Systems
@@ -137,7 +142,7 @@ Please read `GotoBLAS_01Readme.txt` for older CPU models already supported by th
137142
- **AMD Bulldozer**: x86-64 ?GEMM FMA4 kernels. (Thanks to Werner Saar)
138143
- **AMD PILEDRIVER**: Uses Bulldozer codes with some optimizations.
139144
- **AMD STEAMROLLER**: Uses Bulldozer codes with some optimizations.
140-
- **AMD ZEN**: Uses Haswell codes with some optimizations.
145+
- **AMD ZEN**: Uses Haswell codes with some optimizations for Zen 2/3 (use SkylakeX for Zen4)
141146

142147
#### MIPS32
143148

@@ -169,13 +174,16 @@ Please read `GotoBLAS_01Readme.txt` for older CPU models already supported by th
169174
- **TSV110**: Optimized some Level-3 helper functions
170175
- **EMAG 8180**: preliminary support based on A57
171176
- **Neoverse N1**: (AWS Graviton2) preliminary support
172-
- **Apple Vortex**: preliminary support based on ARMV8
177+
- **Neoverse V1**: (AWS Graviton3) optimized Level-3 BLAS
178+
- **Apple Vortex**: preliminary support based on ThunderX2/3
179+
- **A64FX**: preliminary support, optimized Level-3 BLAS
180+
- **ARMV8SVE**: any ARMV8 cpu with SVE extensions
173181

174182
#### PPC/PPC64
175183

176184
- **POWER8**: Optimized BLAS, only for PPC64LE (Little Endian), only with `USE_OPENMP=1`
177185
- **POWER9**: Optimized Level-3 BLAS (real) and some Level-1,2. PPC64LE with OpenMP only.
178-
- **POWER10**:
186+
- **POWER10**: Optimized Level-3 BLAS including SBGEMM and some Level-1,2.
179187

180188
#### IBM zEnterprise System
181189

common_arm64.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,11 @@ static inline int blas_quickdivide(blasint x, blasint y){
162162
#define HUGE_PAGESIZE ( 4 << 20)
163163

164164
#ifndef BUFFERSIZE
165+
#if defined(NEOVERSEN1) || defined(NEOVERSEN2) || defined(NEOVERSEV1) || defined(A64FX) || defined(ARMV8SVE)
166+
#define BUFFER_SIZE (32 << 22)
167+
#else
165168
#define BUFFER_SIZE (32 << 20)
169+
#endif
166170
#else
167171
#define BUFFER_SIZE (32 << BUFFERSIZE)
168172
#endif

0 commit comments

Comments
 (0)