Skip to content

Commit 2fefdfa

Browse files
authored
Merge branch 'OpenMathLib:develop' into azurewincl
2 parents a5c04e3 + b70227a commit 2fefdfa

File tree

17 files changed

+999
-630
lines changed

17 files changed

+999
-630
lines changed

.github/workflows/codspeed-bench.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,13 @@ jobs:
139139
cd build/openblas_wrap
140140
python -c'import _flapack; print(dir(_flapack))'
141141
142+
- name: Run benchmarks under pytest-benchmark
143+
run: |
144+
cd benchmark/pybench
145+
pip install pytest-benchmark
146+
export PYTHONPATH=$PWD/build-install/lib/python${{matrix.pyver}}/site-packages/
147+
OPENBLAS_NUM_THREADS=1 pytest benchmarks/bench_blas.py -k 'gesdd'
148+
142149
- name: Run benchmarks
143150
uses: CodSpeedHQ/action@v2
144151
with:

.github/workflows/docs.yml

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,36 @@
11
name: Publish docs via GitHub Pages
2+
23
on:
34
push:
45
branches:
56
- develop
7+
pull_request:
8+
branches:
9+
- develop
10+
611
jobs:
712
build:
813
name: Deploy docs
914
if: "github.repository == 'OpenMathLib/OpenBLAS'"
1015
runs-on: ubuntu-latest
1116
steps:
1217
- uses: actions/checkout@v4
18+
with:
19+
fetch-depth: 0
20+
1321
- uses: actions/setup-python@v5
1422
with:
1523
python-version: "3.10"
16-
- run: pip install mkdocs mkdocs-material
17-
# mkdocs gh-deploy command only builds to the top-level, hence building then deploying ourselves
18-
- run: mkdocs build
24+
25+
- name: Install MkDocs and doc theme packages
26+
run: pip install mkdocs mkdocs-material mkdocs-git-revision-date-localized-plugin
27+
28+
- name: Build docs site
29+
run: mkdocs build
30+
31+
# mkdocs gh-deploy command only builds to the top-level, hence deploying
32+
# with this action instead.
33+
# Deploys to http://www.openmathlib.org/OpenBLAS/docs/
1934
- name: Deploy docs
2035
uses: peaceiris/actions-gh-pages@4f9cc6602d3f66b9c108549d475ec49e8ef4d45e # v4.0.0
2136
if: ${{ github.ref == 'refs/heads/develop' }}

azure-pipelines.yml

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -141,21 +141,21 @@ jobs:
141141
142142
- job: OSX_OpenMP
143143
pool:
144-
vmImage: 'macOS-11'
144+
vmImage: 'macOS-12'
145145
steps:
146146
- script: |
147147
brew update
148-
make TARGET=CORE2 DYNAMIC_ARCH=1 USE_OPENMP=1 INTERFACE64=1 CC=gcc-10 FC=gfortran-10
149-
make TARGET=CORE2 DYNAMIC_ARCH=1 USE_OPENMP=1 INTERFACE64=1 CC=gcc-10 FC=gfortran-10 PREFIX=../blasinst install
148+
make TARGET=CORE2 DYNAMIC_ARCH=1 USE_OPENMP=1 INTERFACE64=1 CC=gcc-13 FC=gfortran-13
149+
make TARGET=CORE2 DYNAMIC_ARCH=1 USE_OPENMP=1 INTERFACE64=1 CC=gcc-13 FC=gfortran-13 PREFIX=../blasinst install
150150
ls -lR ../blasinst
151151
152152
- job: OSX_GCC_Nothreads
153153
pool:
154-
vmImage: 'macOS-11'
154+
vmImage: 'macOS-12'
155155
steps:
156156
- script: |
157157
brew update
158-
make USE_THREADS=0 CC=gcc-10 FC=gfortran-10
158+
make USE_THREADS=0 CC=gcc-13 FC=gfortran-13
159159
160160
- job: OSX_GCC12
161161
pool:
@@ -195,15 +195,15 @@ jobs:
195195
196196
- job: OSX_dynarch_cmake
197197
pool:
198-
vmImage: 'macOS-11'
198+
vmImage: 'macOS-12'
199199
variables:
200200
LD_LIBRARY_PATH: /usr/local/opt/llvm/lib
201201
LIBRARY_PATH: /usr/local/opt/llvm/lib
202202
steps:
203203
- script: |
204204
mkdir build
205205
cd build
206-
cmake -DTARGET=CORE2 -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST='NEHALEM HASWELL SKYLAKEX' -DCMAKE_C_COMPILER=gcc-10 -DCMAKE_Fortran_COMPILER=gfortran-10 -DBUILD_SHARED_LIBS=ON ..
206+
cmake -DTARGET=CORE2 -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST='NEHALEM HASWELL SKYLAKEX' -DCMAKE_C_COMPILER=gcc-13 -DCMAKE_Fortran_COMPILER=gfortran-13 -DBUILD_SHARED_LIBS=ON ..
207207
cmake --build .
208208
ctest
209209
@@ -242,7 +242,7 @@ jobs:
242242
243243
- job: OSX_NDK_ARMV7
244244
pool:
245-
vmImage: 'macOS-11'
245+
vmImage: 'macOS-12'
246246
steps:
247247
- script: |
248248
brew update
@@ -252,35 +252,35 @@ jobs:
252252
253253
- job: OSX_IOS_ARMV8
254254
pool:
255-
vmImage: 'macOS-11'
255+
vmImage: 'macOS-12'
256256
variables:
257-
CC: /Applications/Xcode_12.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
258-
CFLAGS: -O2 -Wno-macro-redefined -isysroot /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.4.sdk -arch arm64 -miphoneos-version-min=10.0
257+
CC: /Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
258+
CFLAGS: -O2 -Wno-macro-redefined -isysroot /Applications/Xcode_14.2.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS16.2.sdk -arch arm64 -miphoneos-version-min=10.0
259259
steps:
260260
- script: |
261261
make TARGET=ARMV8 DYNAMIC_ARCH=1 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1
262262
263263
- job: OSX_IOS_ARMV7
264264
pool:
265-
vmImage: 'macOS-11'
265+
vmImage: 'macOS-12'
266266
variables:
267-
CC: /Applications/Xcode_12.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
268-
CFLAGS: -O2 -mno-thumb -Wno-macro-redefined -isysroot /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.4.sdk -arch armv7 -miphoneos-version-min=5.1
267+
CC: /Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
268+
CFLAGS: -O2 -mno-thumb -Wno-macro-redefined -isysroot /Applications/Xcode_14.2.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS16.2.sdk -arch armv7 -miphoneos-version-min=5.1
269269
steps:
270270
- script: |
271271
make TARGET=ARMV7 DYNAMIC_ARCH=1 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1
272272
273273
- job: OSX_xbuild_DYNAMIC_ARM64
274274
pool:
275-
vmImage: 'macOS-11'
275+
vmImage: 'macOS-12'
276276
variables:
277-
CC: /Applications/Xcode_12.5.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
278-
CFLAGS: -O2 -Wno-macro-redefined -isysroot /Applications/Xcode_12.5.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.3.sdk -arch arm64
277+
CC: /Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
278+
CFLAGS: -O2 -Wno-macro-redefined -isysroot /Applications/Xcode_14.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk -arch arm64
279279
steps:
280280
- script: |
281-
ls /Applications/Xcode_12.5.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs
282-
/Applications/Xcode_12.5.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -arch arm64 --print-supported-cpus
283-
/Applications/Xcode_11.7.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang --version
281+
ls /Applications/Xcode_14.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs
282+
/Applications/Xcode_12.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -arch arm64 --print-supported-cpus
283+
/Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang --version
284284
make TARGET=ARMV8 DYNAMIC_ARCH=1 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1
285285
286286
- job: ALPINE_MUSL

benchmark/pybench/benchmarks/bench_blas.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -234,11 +234,14 @@ def test_gesdd(benchmark, mn, variant):
234234
gesdd = ow.get_func('gesdd', variant)
235235
u, s, vt, info = benchmark(run_gesdd, a, lwork, gesdd)
236236

237-
assert info == 0
238-
239-
atol = {'s': 1e-5, 'd': 1e-13}
240-
241-
np.testing.assert_allclose(u @ np.diag(s) @ vt, a, atol=atol[variant])
237+
if variant != 's':
238+
# On entry to SLASCL parameter number 4 had an illegal value
239+
# under codspeed (cannot repro locally or on CI w/o codspeed)
240+
# https://github.com/OpenMathLib/OpenBLAS/issues/4776
241+
assert info == 0
242+
243+
atol = {'s': 1e-5, 'd': 1e-13}
244+
np.testing.assert_allclose(u @ np.diag(s) @ vt, a, atol=atol[variant])
242245

243246

244247
# linalg.eigh

cmake/prebuild.cmake

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1309,6 +1309,15 @@ endif ()
13091309
"#define DTB_DEFAULT_ENTRIES 128\n"
13101310
"#define DTB_SIZE 4096\n"
13111311
"#define L2_ASSOCIATIVE 8\n")
1312+
elseif ("${TCORE}" STREQUAL "RISCV64_GENERIC")
1313+
file(APPEND ${TARGET_CONF_TEMP}
1314+
"#define L1_DATA_SIZE 32768\n"
1315+
"#define L1_DATA_LINESIZE 32\n"
1316+
"#define L2_SIZE 1048576\n"
1317+
"#define L2_LINESIZE 32 \n"
1318+
"#define DTB_DEFAULT_ENTRIES 128\n"
1319+
"#define DTB_SIZE 4096\n"
1320+
"#define L2_ASSOCIATIVE 4\n")
13121321
endif()
13131322
set(SBGEMM_UNROLL_M 8)
13141323
set(SBGEMM_UNROLL_N 4)

cmake/system.cmake

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -615,7 +615,7 @@ if (${CMAKE_SYSTEM_NAME} STREQUAL "Windows")
615615
endif ()
616616

617617
if (CMAKE_Fortran_COMPILER)
618-
if (${F_COMPILER} STREQUAL "NAG" OR ${F_COMPILER} STREQUAL "CRAY" OR CMAKE_Fortran_COMPILER_ID MATCHES "LLVMFlang.*")
618+
if ("${F_COMPILER}" STREQUAL "NAG" OR "${F_COMPILER}" STREQUAL "CRAY" OR CMAKE_Fortran_COMPILER_ID MATCHES "LLVMFlang.*")
619619
set(FILTER_FLAGS "-msse3;-mssse3;-msse4.1;-mavx;-mavx2,-mskylake-avx512")
620620
if (CMAKE_Fortran_COMPILER_ID MATCHES "LLVMFlang.*")
621621
message(STATUS "removing fortran flags")

common_x86_64.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -253,7 +253,7 @@ static __inline unsigned int blas_quickdivide(unsigned int x, unsigned int y){
253253
#ifndef BUFFERSIZE
254254
#define BUFFER_SIZE (32 << 22)
255255
#else
256-
#define BUFFER_SIZE (32 << BUFFERSIZE)
256+
#define BUFFER_SIZE (32UL << BUFFERSIZE)
257257
#endif
258258

259259
#define SEEK_ADDRESS

cpuid_x86.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1529,6 +1529,7 @@ int get_cpuname(void){
15291529
switch (model) {
15301530
case 5: // Comet Lake H and S
15311531
case 6: // Comet Lake U
1532+
case 10: // Meteor Lake
15321533
if(support_avx2())
15331534
return CPUTYPE_HASWELL;
15341535
if(support_avx())
@@ -2391,10 +2392,10 @@ int get_coretype(void){
23912392
else
23922393
return CORE_NEHALEM;
23932394
}
2394-
case 15:
2395-
if (model <= 0x2) return CORE_NORTHWOOD;
2396-
else return CORE_PRESCOTT;
23972395
}
2396+
case 15:
2397+
if (model <= 0x2) return CORE_NORTHWOOD;
2398+
else return CORE_PRESCOTT;
23982399
}
23992400
}
24002401

docs/about.md

Lines changed: 32 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,45 @@
22

33
We have a [GitHub discussions](https://github.com/OpenMathLib/OpenBLAS/discussions/) forum to discuss usage and development of OpenBLAS. We also have a [Google group for *users*](https://groups.google.com/forum/#!forum/openblas-users) and a [Google group for *development of*](https://groups.google.com/forum/#!forum/openblas-dev) OpenBLAS.
44

5-
## Donations
6-
7-
You can read OpenBLAS statement of receipts and disbursement and cash balance on [google doc](https://docs.google.com/spreadsheet/ccc?key=0AghkTjXe2lDndE1UZml0dGpaUzJmZGhvenBZd1F2R1E&usp=sharing). A backer list is available [on GitHub](https://github.com/OpenMathLib/OpenBLAS/blob/develop/BACKERS.md).
8-
9-
We welcome the hardware donation, including the latest CPU and boards.
10-
115
## Acknowledgements
126

13-
This work is partially supported by
7+
This work was or is partially supported by the following grants, contracts and institutions:
8+
149
* Research and Development of Compiler System and Toolchain for Domestic CPU, National S&T Major Projects: Core Electronic Devices, High-end General Chips and Fundamental Software (No.2009ZX01036-001-002)
1510
* National High-tech R&D Program of China (Grant No.2012AA010903)
11+
* [PerfXLab](http://www.perfxlab.com/)
12+
* Chan Zuckerberg Initiative's Essential Open Source Software for Science program:
13+
* Cycle 1 grant: [Strengthening NumPy's foundations - growing beyond code](https://figshare.com/articles/journal_contribution/Proposal_NumPy_OpenBLAS_for_Chan_Zuckerberg_Initiative_EOSS_2019_round_1/10302167) (2019-2020)
14+
* Cycle 3 grant: [Improving usability and sustainability for NumPy and OpenBLAS](https://chanzuckerberg.com/eoss/proposals/improving-usability-and-sustainability-for-numpy-and-openblas/) (2020-2021)
15+
* Sovereign Tech Fund funding: [Keeping high performance linear algebra computation accessible and open for all](https://www.sovereigntechfund.de/tech/openblas) (2023-2024)
16+
17+
Over the course of OpenBLAS development, a number of donations were received.
18+
You can read OpenBLAS's statement of receipts and disbursement and cash balance in
19+
[this Google doc](https://docs.google.com/spreadsheet/ccc?key=0AghkTjXe2lDndE1UZml0dGpaUzJmZGhvenBZd1F2R1E&usp=sharing) (covers 2013-2016).
20+
A list of backers is available [in BACKERS.md](https://github.com/OpenMathLib/OpenBLAS/blob/develop/BACKERS.md) in the main repo.
21+
22+
### Donations
23+
24+
We welcome hardware donations, including the latest CPUs and motherboards.
25+
26+
27+
## Open source users of OpenBLAS
28+
29+
Prominent open source users of OpenBLAS include:
30+
31+
* [Julia](https://julialang.org) - a high-level, high-performance dynamic programming language for technical computing
32+
* [NumPy](https://numpy.org) - the fundamental package for scientific computing with Python
33+
* [SciPy](https://scipy.org) - fundamental algorithms for scientific computing in Python
34+
* [R](https://www.r-project.org/) - a free software environment for statistical computing and graphics
35+
* [OpenCV](https://opencv.org/) - the world's biggest computer vision library
1636

17-
## Users of OpenBLAS
37+
OpenBLAS is packaged in most major Linux distros, as well as general and
38+
numerical computing-focused packaging ecosystems like Nix, Homebrew, Spack and
39+
conda-forge.
1840

19-
* <a href='http://julialang.org/'>Julia - a high-level, high-performance dynamic programming language for technical computing</a><br />
20-
* Ceemple v1.0.3 (C++ technical computing environment), including OpenBLAS, Qt, Boost, OpenCV and others. The only solution with immediate-recompilation of C++ code. Available from <a href='http://www.ceemple.com'>Ceemple C++ Technical Computing</a>.
21-
* [netlib-java](https://github.com/fommil/netlib-java) and various upstream libraries, allowing OpenBLAS to be used from languages on the Java Virtual Machine.
41+
OpenBLAS is used directly by libraries written in C, C++ and Fortran (and
42+
probably other languages), and directly by end users in those languages.
2243

23-
<!-- TODO: academia users, industry users, hpc centers deployed openblas, etc. -->
2444

2545
## Publications
2646

docs/build_system.md

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
This page describes the Make-based build, which is the default/authoritative
2+
build method. Note that the OpenBLAS repository also supports building with
3+
CMake (not described here) - that generally works and is tested, however there
4+
may be small differences between the Make and CMake builds.
5+
16
!!! warning
27
This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself.
38

@@ -95,10 +100,21 @@ NUM_PARALLEL - define this to the number of OpenMP instances that your code m
95100
```
96101

97102

98-
OpenBLAS uses a fixed set of memory buffers internally, used for communicating and compiling partial results from individual threads.
99-
For efficiency, the management array structure for these buffers is sized at build time - this makes it necessary to know in advance how
100-
many threads need to be supported on the target system(s).
101-
With OpenMP, there is an additional level of complexity as there may be calls originating from a parallel region in the calling program. If OpenBLAS gets called from a single parallel region, it runs single-threaded automatically to avoid overloading the system by fanning out its own set of threads.
102-
In the case that an OpenMP program makes multiple calls from independent regions or instances in parallel, this default serialization is not
103-
sufficient as the additional caller(s) would compete for the original set of buffers already in use by the first call.
104-
So if multiple OpenMP runtimes call into OpenBLAS at the same time, then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. Setting NUM_PARALLEL to the upper bound on the number of OpenMP runtimes that you can have in a process ensures that there are a sufficient number of buffer sets available
103+
OpenBLAS uses a fixed set of memory buffers internally, used for communicating
104+
and compiling partial results from individual threads. For efficiency, the
105+
management array structure for these buffers is sized at build time - this
106+
makes it necessary to know in advance how many threads need to be supported on
107+
the target system(s).
108+
109+
With OpenMP, there is an additional level of complexity as there may be calls
110+
originating from a parallel region in the calling program. If OpenBLAS gets
111+
called from a single parallel region, it runs single-threaded automatically to
112+
avoid overloading the system by fanning out its own set of threads. In the case
113+
that an OpenMP program makes multiple calls from independent regions or
114+
instances in parallel, this default serialization is not sufficient as the
115+
additional caller(s) would compete for the original set of buffers already in
116+
use by the first call. So if multiple OpenMP runtimes call into OpenBLAS at the
117+
same time, then only one of them will be able to make progress while all the
118+
rest of them spin-wait for the one available buffer. Setting `NUM_PARALLEL` to
119+
the upper bound on the number of OpenMP runtimes that you can have in a process
120+
ensures that there are a sufficient number of buffer sets available.

0 commit comments

Comments
 (0)