Skip to content

Numba Configuration ⚡

Srinath Kailasa edited this page Jul 20, 2021 · 2 revisions

Numba's Threading Layer

Numba has three compatible threading layers (1) OpenMP (2) Intel's TBB (3) A generic cross platform implementation. By default it choses TBB for thread safety purposes. However, PyExaFMM relies on the OpenMP backend.

Nested Parallelism

This is where a user defines a threadpool, the workers of which also instantiate threads. an example of where this might occur is in the usage of a numpy or scipy linear algebra function (calling a multithreaded underlying BLAS/LAPACK routine, using an OpenMP runtime) from within the parallel for loop in a numba routine.

To avoid this, we restrict the allowed number of threads in an OpenMP parallel region to 1.

export OMP_NUM_THREADS=1

We find that in practice using an OpenMP backend for Numba works better in PyExaFMM's usecase than TBB.

export NUMBA_THREADING_LAYER='omp'

Our usecase is relatively simple, involving simple loops over arrays, with some linear algebra - corresponding to each thread having extremely similar amounts of work. TBB contains complex workstealing algorithms that optimise unbalanced workloads over threads well. Thi appears to result in a significant overhead in PyExaFMM. As an example, a Laplace problem, where expansion order (multipole and local) p=5, with 1,000,000 randomly distributed points, and target rank k=50, takes 13 seconds with OpenMP in comparison to 65 seconds with TBB.

Notes:

  • LibTBB is not necessarily included with Ubuntu. Numba requires libtbb-dev too. Check whether TBB is configured correctly using numba -s
Clone this wiki locally