-
Notifications
You must be signed in to change notification settings - Fork 2
Numba Configuration ⚡
Numba has three compatible threading layers (1) OpenMP (2) Intel's TBB (3) A generic cross platform implementation. By default it choses TBB for thread safety purposes. However, PyExaFMM relies on the OpenMP backend.
This is where a user defines a threadpool, the workers of which also instantiate threads. an example of where this might occur is in the usage of a numpy or scipy linear algebra function (calling a multithreaded underlying BLAS/LAPACK routine, using an OpenMP runtime) from within the parallel for loop in a numba routine.
To avoid this, we restrict the allowed number of threads in an OpenMP parallel region to 1.
export OMP_NUM_THREADS=1
We find that in practice using an OpenMP backend for Numba works better in PyExaFMM's usecase than TBB.
export NUMBA_THREADING_LAYER='omp'
Our usecase is relatively simple, involving simple loops over arrays, with some linear algebra - corresponding to each thread having extremely similar amounts of work. TBB contains complex workstealing algorithms that optimise unbalanced workloads over threads well. Thi appears to result in a significant overhead in PyExaFMM. As an example, a Laplace problem, where expansion order (multipole and local) p=5, with 1,000,000 randomly distributed points, and target rank k=50, takes 13 seconds with OpenMP in comparison to 65 seconds with TBB.
- LibTBB is not necessarily included with Ubuntu. Numba requires libtbb-dev too. Check whether TBB is configured correctly using
numba -s