Numba in Parallel Reduces Precision #366

mj-gomes · 2025-02-25T15:55:44Z

mj-gomes
Feb 25, 2025
Collaborator

Hello everyone.

I came across an interesting problem with numba parallelization and its influence in the precision of the computations, at least for the TOD and Mapmaking operations in hwp_sys.py.

Context: I was running some tests for the hwp_sys.py code and the tests kept failing, even though I was just testing for the ideal case, and I knew (from looking at the maps) that it should work. The output maps should be equal to the input maps. The problem was that a percentage of the (output maps - input maps) pixels was not inside the 1e-9 tolerance that I had put to account for eventual approximations. This percentage became smaller when increasing the nside.

Origin: I found out that this comes from @njit(parallel=True) decorator, specifically the "parallel=True" part. Because addition and multiplication of floating numbers are not associative operations, and apparently numba in the parallel mode changes the order of the operations, there are roundings which accumulate and end up being fairly important. At least this is the best explanation I could find. And indeed, if parallel=False, I have 100% equality up to 1e-9 tolerance.

For nside 256, for example, using 2 detectors in a 1y simulation, I had about 0.03% of the pixels above the 1e-9 tolerance. This is a very small percentage of the pixels, but it should increase with the number of detectors. For now I wrote a test method that ignores this, but I was talking with @hivon about it and it would be good to know if anyone as already ran into this problem and has a good solution. Doing parallel=False is, in my opinion, not an option, because this parallelization increases the speed by a significant amount (I can run some tests to obtain specific values for this speedup if needed).

I don't know much about numba, and from what I have seen, there is no straightforward way to force it to keep the order of operations. Should we restructure the methods to force the order of operations? Or else, should we consider using other options, like parallelizing these methods with mpi instead of numba? I suppose however that numba is much faster that mpi for this numpy operations in loops since I understand it is optimized for that.

Thank you in advance for the discussion,
Miguel

mreineck · 2025-02-25T16:26:11Z

mreineck
Feb 25, 2025
Collaborator

Hi Miguel,

are you observing this 1e-9 discrepancy with double precision data or with single precision?
The latter would be quite expected (anything better than 1e-7 accuracy would be a surprise actually), but with double precision it would be catastrophic. Even assuming non-associative FP math there should not be any relative errors larger than, say, 1e-14. If there are, this needs closer investigation; I'll be happy to help if needed!

Cheers,
Martin

0 replies

mj-gomes · 2025-02-26T10:43:49Z

mj-gomes
Feb 26, 2025
Collaborator Author

Hi @mreineck,

I had the bigger arrays in single precision for memory saving. I put everything in double precision and I still get the same problem. I still get about 0.03% of the pixels with a difference bigger than 1e-9 between output and input maps.

Furthermore, this is the output of np.testing.assert_almost_equal(input_maps, output_maps, decimal=9, verbose=True):

Arrays are not almost equal to 9 decimals

Mismatched elements: 810 / 2359296 (0.0343%)
Max absolute difference: 0.00011122
Max relative difference: 2085.3437791
x: array([[-9.640286380e-04, -9.747654477e-04, -1.090827223e-03, ...,
1.103412259e-03, 1.129195597e-03, 1.130668143e-03],
[ 1.349309625e-06, 2.491627361e-06, 3.743196745e-06, ...,...
y: array([[-9.640286380e-04, -9.747654477e-04, -1.090827223e-03, ...,
1.103412259e-03, 1.129195597e-03, 1.130668143e-03],
[ 1.349309625e-06, 2.491627361e-06, 3.743196745e-06, ...,...

So, in (at least) one of the pixels, this is as bad as 1e-4.

0 replies

ziotom78 · 2025-02-26T11:31:31Z

ziotom78
Feb 26, 2025
Maintainer

Hi @mj-gomes , good catch! Yes, instruction reordering might be the culprit here. May you try to use parallel={'fusion': False} to see if the problem disappears? (See here: https://numba.pydata.org/numba-doc/0.40.0/developer/architecture.html.)

Otherwise, we could investigate the machine code produced by Numba using inspect_asm, but for this, we need the shortest Numba function that exhibits the problem; otherwise, it would be a pain to debug the problem.

0 replies

mreineck · 2025-02-26T11:52:06Z

mreineck
Feb 26, 2025
Collaborator

Since the error is so large, we probably need to consider another potential source...
I expect that the code involves a lot of ang2pix calls, and it can happen in rare cases that truncation of angles to single precision (or even slightly different computation of the angles themselves) may cause a TOD sample to end up in a different pixel than the one it was in originally. Whenever this happens, tiny changes in input variables can lead to very noticeable changes in the output.

0 replies

anand-avinash · 2025-02-26T14:53:47Z

anand-avinash
Feb 26, 2025
Collaborator

I dealt with similar problems recently while validating my map-maker against the binner of lbsim. It is true that changing dtype from f32 to f64 results in slightly different pointing angles, it would not change a lot of pixel indices (assuming small nside). But on the other hand, the change in dtype also changes the polarization angles (including orientation angle, hwp_angle, etc.). And since during map-making we compute the sum of sine, cosine, sine^2, cosine^2, and sine*cosine of the angles for each pixel, the difference between the results from both dtypes adds up quickly and becomes significantly large.

Another point is that, on-the-fly pointing computation happens in single-precision. This includes pointing computation in scan_map* functions and in the map-making functions. If you change the dtype of pointings between these two function calls by calling sim.precompute_pointings(pointings_dtype=np.float64), the input and output maps will not match, because the map-maker won't use the same angles that were used to scan the input maps. To ensure the consistency of the floating-point numbers throughout the simulation, if you want to work with double-precision numbers, I recommend setting tod_dtype while creating observations, and calling sim.precompute_pointings(pointings_dtype=np.float64) before scanning the maps.

Also numpy recommends using np.testing.assert_allclose() over np.testing.assert_almost_equal(). I am not sure though which one we should prefer for our tests.

Anyway, in many of the tests I implemented in my code, the best relative difference I get from f32 is $\sim5\times 10^{-3}$, while for f64, it is $\sim 10^{-6}$

0 replies

mj-gomes · 2025-02-26T19:15:33Z

mj-gomes
Feb 26, 2025
Collaborator Author

Thank you all for your help. This is what I have tried so far.

The option suggested by @ziotom78, to try parallel={'fusion': False} did not seem to change the precision obtained.

--

I tried @anand-avinash suggestion of setting tod_dtype to np.float64 and calling sim.precompute_pointings(pointings_dtype=np.float64). Maybe I did something wrong but it did not seem to change the percentage of "not precise enough" pixels neither.

--

I also tried Eric's suggestion of separating the $A^T A$ and the $A^T d$ computations in two separate functions and give one of them @njit(parallel=True) and the other one "@njit(parallel=False)", because the T,Q,U dynamic is larger in $A^T d$ and hence the floating problems could come essentially from here. Even though that are some small differences in the % at the end, the problem is still there at the same order of magnitude.

--

Also, @mreineck, if I understood correctly your comment, I think the problem might not be in there, as the angles passed to the ang2pix function are in double precision (at least I am calling get_pointings with pointings_dtype = np.float64).

--

p.s. This is not very important I believe, but I also tried and noticed that the % of pixels with error increases with the value of NUMBA_NUM_THREADS, which I suppose makes sense because, as we increase the threads, numba changes the order of the operations more and it becomes more "chaotic" in terms of floating approximations.

0 replies

mj-gomes · 2025-05-22T14:33:07Z

mj-gomes
May 22, 2025
Collaborator Author

Hi,

So, this problem comes only in the computation of A^T A and A^T d (mapmaking on the fly) method. The reason for this is specified in this comment:

reducing into slices or indexes of an array is not supported if the elements specified by the slice or index is written to simultaneously by multiple parallel threads. The compiler will not detect such cases and may result in a race condition.

When we compute A^T A and A^T d in parallel, different processes will be accessing the same pixel at the same time, resulting in a race condition.

Luckily, this mapmaking-on-the-fly method is just a convenient method which will not be the true one used for simulations or real data analysis, so this problem can be ignored. I will put the default as parallel=False in this method anyway so that no one gets this problem. The mapmaking on the fly will be slower than ideally, but at least we know now that the TOD calculus methods, which are the really important ones in hwp_sys.py, can be sucessfully parallelized without loss of precision.

I will close this discussion as resolved, but feel free to add comments if something comes up.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Numba in Parallel Reduces Precision #366

Uh oh!

{{title}}

Uh oh!

Replies: 7 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Numba in Parallel Reduces Precision #366

Uh oh!

mj-gomes Feb 25, 2025 Collaborator

Replies: 7 comments

Uh oh!

mreineck Feb 25, 2025 Collaborator

Uh oh!

mj-gomes Feb 26, 2025 Collaborator Author

Uh oh!

ziotom78 Feb 26, 2025 Maintainer

Uh oh!

mreineck Feb 26, 2025 Collaborator

Uh oh!

anand-avinash Feb 26, 2025 Collaborator

Uh oh!

Uh oh!

mj-gomes Feb 26, 2025 Collaborator Author

Uh oh!

mj-gomes May 22, 2025 Collaborator Author

mj-gomes
Feb 25, 2025
Collaborator

mreineck
Feb 25, 2025
Collaborator

mj-gomes
Feb 26, 2025
Collaborator Author

ziotom78
Feb 26, 2025
Maintainer

mreineck
Feb 26, 2025
Collaborator

anand-avinash
Feb 26, 2025
Collaborator

mj-gomes
Feb 26, 2025
Collaborator Author

mj-gomes
May 22, 2025
Collaborator Author