Inquiry on the differences between two ways of calculating the dimensional mean #6081
-
Hello, Best regards |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
The mean operation is the sum of the values divided by the number of values (by definition). Unfortunately, adding all values up and dividing accumulates floating-point errors. So, presuming your da.mean(dim="lat",skipna=True).mean(dim="lon",skipna=True) and da.mean(dim=["lon","lat"], skipna=True) are due to differences in accumulated floating-point errors in both operations. To minimize the accumulated floating error, you may want use |
Beta Was this translation helpful? Give feedback.
-
Thank you both for your kind help. In my case, NaN values are quite common in the dataset so I think it will essentially explain the difference in the computed mean values. Besides, I also think the floating-point issue is really interesting and definitely worth noting. |
Beta Was this translation helpful? Give feedback.
The mean operation is the sum of the values divided by the number of values (by definition). Unfortunately, adding all values up and dividing accumulates floating-point errors. So, presuming your
da
consists of floating-point numbers, the differences you are seeing betweenand
are due to differences in accumulated floating-point errors in both operations. To minimize the accumulated floating error, you may want use
da.mean(dim=["lon","lat"], ...)
instead of the chained.mean("lon", ....).mean("lat", ....)
operation.