Inquiry on the differences between two ways of calculating the dimensional mean #6081

ChenyaoYang123 · 2021-12-16T21:20:14Z

ChenyaoYang123
Dec 16, 2021

Hello,
I am new to Xarray, and I am having a simple question.
Let´s see we have a 3-D xarray da with dimension (time, lat, lon).
So what is the difference between da.mean(dim="lat",skipna=True).mean(dim="lon",skipna=True) and da.mean(dim=["lon","lat"], skipna=True)? The data I have show clearly a difference between the two ways of calculating mean.
Many thanks in advance for your kind attentions and help.

Best regards
Chenyao

Answered by andersy005

Feb 27, 2022

The mean operation is the sum of the values divided by the number of values (by definition). Unfortunately, adding all values up and dividing accumulates floating-point errors. So, presuming your da consists of floating-point numbers, the differences you are seeing between

da.mean(dim="lat",skipna=True).mean(dim="lon",skipna=True)

and

da.mean(dim=["lon","lat"], skipna=True)

are due to differences in accumulated floating-point errors in both operations. To minimize the accumulated floating error, you may want use da.mean(dim=["lon","lat"], ...) instead of the chained .mean("lon", ....).mean("lat", ....) operation.

View full answer

andersy005 · 2022-02-27T18:10:29Z

andersy005
Feb 27, 2022
Maintainer

The mean operation is the sum of the values divided by the number of values (by definition). Unfortunately, adding all values up and dividing accumulates floating-point errors. So, presuming your da consists of floating-point numbers, the differences you are seeing between

da.mean(dim="lat",skipna=True).mean(dim="lon",skipna=True)

and

da.mean(dim=["lon","lat"], skipna=True)

are due to differences in accumulated floating-point errors in both operations. To minimize the accumulated floating error, you may want use da.mean(dim=["lon","lat"], ...) instead of the chained .mean("lon", ....).mean("lat", ....) operation.

1 reply

max-sixty Feb 27, 2022
Maintainer

The other case where there can be differences is where there are nulls:

In [6]: xr.DataArray([[10, 1], [np.nan, 5]]).mean("dim_0", skipna=True).mean("dim_1", skipna=True)
Out[6]:
<xarray.DataArray ()>
array(6.5)

In [7]: xr.DataArray([[10, 1], [np.nan, 5]]).mean(skipna=True)
Out[7]:
<xarray.DataArray ()>
array(5.33333333)

ChenyaoYang123 · 2022-02-27T20:08:45Z

ChenyaoYang123
Feb 27, 2022
Author

Thank you both for your kind help. In my case, NaN values are quite common in the dataset so I think it will essentially explain the difference in the computed mean values. Besides, I also think the floating-point issue is really interesting and definitely worth noting.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inquiry on the differences between two ways of calculating the dimensional mean #6081

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Inquiry on the differences between two ways of calculating the dimensional mean #6081

Uh oh!

Uh oh!

ChenyaoYang123 Dec 16, 2021

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

andersy005 Feb 27, 2022 Maintainer

Uh oh!

max-sixty Feb 27, 2022 Maintainer

Uh oh!

ChenyaoYang123 Feb 27, 2022 Author

ChenyaoYang123
Dec 16, 2021

Replies: 2 comments 1 reply

andersy005
Feb 27, 2022
Maintainer

max-sixty Feb 27, 2022
Maintainer

ChenyaoYang123
Feb 27, 2022
Author