Clarifying the meaning of NA in .open_dataset docs #7914
-
The description of the parameter mask_and_scale says: "If True, replace array values equal to _FillValue with NA and scale values according to the formula original_values * scale_factor + add_offset, where _FillValue, scale_factor and add_offset are taken from variable attributes (if they exist). If the _FillValue or missing_value attribute contains multiple values a warning will be issued and all array values matching one of the multiple values will be replaced by NA. mask_and_scale defaults to True except for the pseudonetcdf backend. This keyword may not be supported by all the backends." It is honestly extremely confusing. There is NA (which is clearly not a NaN). And there is a formula according to which FillValue is constructed. Could anyone provide a clarification for this option, please? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
-
apologies for the confusion, @yutik-nn.
When you set here's an example to illustrate this: In [10]: import xarray as xr
In [11]: import numpy as np
In [12]: data = np.array([1.0, 2.0, 3.0, -9999.0, 5.0]) # -9999.0 will be our "fill value"
In [13]: da = xr.DataArray(
...: data,
...: dims='x',
...: name='my_variable',
...: attrs={
...: '_FillValue': -9999.0, # This is the value we'll use to represent missing data
...: 'scale_factor': 0.1, # We'll use this to scale our data
...: 'add_offset': 1.0 # We'll add this offset to our data
...: }
...: )
In [14]: ds = xr.Dataset({da.name: da})
In [15]: ds.to_netcdf('my_dataset.nc') Now, let's read the data: In [16]: dset = xr.open_dataset('my_dataset.nc', mask_and_scale=False, engine='netcdf4')
In [17]: dset_scaled = xr.open_dataset('my_dataset.nc', mask_and_scale=True, engine='netcdf4')
In [18]: dset.my_variable.data
Out[18]: array([ 1.000e+00, 2.000e+00, 3.000e+00, -9.999e+03, 5.000e+00])
In [19]: dset_scaled.my_variable.data
Out[19]: array([1.1, 1.2, 1.3, nan, 1.5]) I hope the explanation and code example helps clarify how |
Beta Was this translation helpful? Give feedback.
apologies for the confusion, @yutik-nn.
NA
in this context is not a separate concept fromNaN
. the distinction can sometimes be a bit confusing because different libraries or contexts may useNA
orNaN
to mean slightly different things, but in the context of Xarray, they are used interchangeably to represent missing data._FillValue
: This attribute specifies a value that should be used to represent missing or undefined data. This is similar to how you might use NaN in NumPy to i…