Replies: 7 comments
-
Hi Miguel, are you observing this 1e-9 discrepancy with double precision data or with single precision? Cheers, |
Beta Was this translation helpful? Give feedback.
-
Hi @mreineck, I had the bigger arrays in single precision for memory saving. I put everything in double precision and I still get the same problem. I still get about 0.03% of the pixels with a difference bigger than 1e-9 between output and input maps. Furthermore, this is the output of np.testing.assert_almost_equal(input_maps, output_maps, decimal=9, verbose=True):
So, in (at least) one of the pixels, this is as bad as 1e-4. |
Beta Was this translation helpful? Give feedback.
-
Hi @mj-gomes , good catch! Yes, instruction reordering might be the culprit here. May you try to use Otherwise, we could investigate the machine code produced by Numba using |
Beta Was this translation helpful? Give feedback.
-
Since the error is so large, we probably need to consider another potential source... |
Beta Was this translation helpful? Give feedback.
-
I dealt with similar problems recently while validating my map-maker against the binner of lbsim. It is true that changing dtype from f32 to f64 results in slightly different pointing angles, it would not change a lot of pixel indices (assuming small nside). But on the other hand, the change in dtype also changes the polarization angles (including orientation angle, hwp_angle, etc.). And since during map-making we compute the sum of sine, cosine, sine^2, cosine^2, and sine*cosine of the angles for each pixel, the difference between the results from both dtypes adds up quickly and becomes significantly large. Another point is that, on-the-fly pointing computation happens in single-precision. This includes pointing computation in Also numpy recommends using Anyway, in many of the tests I implemented in my code, the best relative difference I get from f32 is |
Beta Was this translation helpful? Give feedback.
-
Thank you all for your help. This is what I have tried so far.
--
--
-- Also, @mreineck, if I understood correctly your comment, I think the problem might not be in there, as the angles passed to the ang2pix function are in double precision (at least I am calling get_pointings with -- p.s. This is not very important I believe, but I also tried and noticed that the % of pixels with error increases with the value of NUMBA_NUM_THREADS, which I suppose makes sense because, as we increase the threads, numba changes the order of the operations more and it becomes more "chaotic" in terms of floating approximations. |
Beta Was this translation helpful? Give feedback.
-
Hi, So, this problem comes only in the computation of A^T A and A^T d (mapmaking on the fly) method. The reason for this is specified in this comment:
When we compute A^T A and A^T d in parallel, different processes will be accessing the same pixel at the same time, resulting in a race condition. Luckily, this mapmaking-on-the-fly method is just a convenient method which will not be the true one used for simulations or real data analysis, so this problem can be ignored. I will put the default as parallel=False in this method anyway so that no one gets this problem. The mapmaking on the fly will be slower than ideally, but at least we know now that the TOD calculus methods, which are the really important ones in hwp_sys.py, can be sucessfully parallelized without loss of precision. I will close this discussion as resolved, but feel free to add comments if something comes up. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone.
I came across an interesting problem with numba parallelization and its influence in the precision of the computations, at least for the TOD and Mapmaking operations in hwp_sys.py.
Context: I was running some tests for the hwp_sys.py code and the tests kept failing, even though I was just testing for the ideal case, and I knew (from looking at the maps) that it should work. The output maps should be equal to the input maps. The problem was that a percentage of the (output maps - input maps) pixels was not inside the 1e-9 tolerance that I had put to account for eventual approximations. This percentage became smaller when increasing the nside.
Origin: I found out that this comes from @njit(parallel=True) decorator, specifically the "parallel=True" part. Because addition and multiplication of floating numbers are not associative operations, and apparently numba in the parallel mode changes the order of the operations, there are roundings which accumulate and end up being fairly important. At least this is the best explanation I could find. And indeed, if parallel=False, I have 100% equality up to 1e-9 tolerance.
For nside 256, for example, using 2 detectors in a 1y simulation, I had about 0.03% of the pixels above the 1e-9 tolerance. This is a very small percentage of the pixels, but it should increase with the number of detectors. For now I wrote a test method that ignores this, but I was talking with @hivon about it and it would be good to know if anyone as already ran into this problem and has a good solution. Doing parallel=False is, in my opinion, not an option, because this parallelization increases the speed by a significant amount (I can run some tests to obtain specific values for this speedup if needed).
I don't know much about numba, and from what I have seen, there is no straightforward way to force it to keep the order of operations. Should we restructure the methods to force the order of operations? Or else, should we consider using other options, like parallelizing these methods with mpi instead of numba? I suppose however that numba is much faster that mpi for this numpy operations in loops since I understand it is optimized for that.
Thank you in advance for the discussion,
Miguel
Beta Was this translation helpful? Give feedback.
All reactions