You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compute a weighted moving average with respect to the coordinate instead of a simple N-point moving average.
Some more details
There exist some entries regarding weighted moving average (e.g. here or even in the xarraydocumentation), however, the examples I found are all referring to a "simple" N-point (weighted) moving average. What I'd like to do is to compute the weighted moving average taking into account the underlying coordinate, e.g. within a window of 10 days instead of 10 consecutive data points.
Is there a way of doing this efficiently using xarray? My current workaround consists of transforming the DataArray to a pandas Dataframe, accessing and iterating over the index which is painfully slow (example code for this below).
Why is this important? - An example
Figure A: Imagine a (1) regularly sampled time series and another (2) derived from irregularly sub-sampling time series (1) over a span of 1000 years (or days or whatever you want ;) ). As an example, the sub-sampling frequency is high at the beginning but becomes much coarser later on. This example is motivated from a real-world paleoclimate problem where you typically have a higher sampling for younger times and a lower temporal resolution the further back in time you go. Furthermore, a low-frequency signal on the centennial time scale may be superimposed onto a higher-frequency signal on the decadal time scale.
Figure B: Apply a moving average with a Gaussian kernel to obtain only the centennial variations. For the regularly sampled time series (1), a weighted moving average (implementation follows the example here ) works fine. However, when the sampling is irregular, as in the second time series, the result is less satisfactory because there is too little smoothing where the sampling is high (at the beginning) and too much smoothing where the sampling is low (at the end).
Figure C: The desired result is to apply a moving average using a Gaussian kernel taking into account the underlying time coordinate. Here, a Gaussian kernel with a standard deviation of 25 years is chosen, which effectively smooths out the decadal variations. The implementation, however, 1) is reliant on pandas, 2) is painfully slow and 3) only works for 1D arrays. So, any tips & tricks on how to overcome any (or even all) of the above would be greatly appreciated :)
Minimal working example
Click me (code)
# %%importmatplotlib.pyplotaspltimportnumpyasnpimportpandasaspdimportseabornassnsimportxarrayasxrdefgaussian(x, mu, std):
returnnp.exp(-np.power(x-mu, 2.) / (2*std**2))
defgaussian_weights(window, std):
win_half=window//2steps=np.arange(-win_half, win_half+1)
weights=gaussian(steps, 0, std)
returnxr.DataArray(weights, dims=['window'])
defcoordinate_aware_gaussian_mean_dataframe(df, ref_df, window, std):
'''DataFrame implementation of Gaussian moving average using index. This implementation differs from the standard pandas imeplementation as it constructs a smoothing window based on the actual time, not the number of time steps. That is, a window of 500 constructs a weight window of 500 years sampled at the original resolution. This allows to calculate a Gaussian rolling mean for IRREGULAR time steps.'''# Local smoothing window // df.name -> age indexstart=df.name-window//2end=df.name+window//2x=ref_df.loc[start:end]
# Weights based on Gaussian distribution centered at current locationweights=gaussian(x.index.values, mu=df.name, std=std)
weights=pd.DataFrame(
np.array([weights] *x.shape[1]).T, columns=x.columns, index=x.index
)
# Reintroduce NaNs into weights for correct calculationsweights=weights.where(~x.isnull())
# Weighted meanx_mean= (x*weights).sum() /np.sum(weights)
returnx_meandefcoordinate_aware_gaussian_mean_dataarray(da, window, std):
'''Ugly xarray wrapper for pandas implementation of coordiante aware Gaussian moving average. Only works for a 1D DataArray... '''new=da.copy() *np.nandf=da.to_dataframe()
smoothed=df.apply(lambdax: coordinate_aware_gaussian_mean_dataframe(x, df, window, std)[0], axis=1)
new.loc[:] =smoothed.values.squeeze()
returnnew# Define underlying signaltime=np.arange(0, 1000)
T1=250T2=50signal=np.sin(2*np.pi*time/T1) +np.sin(2*np.pi*time/T2)
# Add noiserng=np.random.default_rng(7)
signal+=.3*rng.standard_normal(size=signal.size)
# Time series (1) // regular samplingda1=xr.DataArray(signal, dims=['time'], coords=dict(time=time))
da1.name='linear'# Take subsample at irregular time stepsnonlinear_time_steps=.00001*np.arange(100)**4nonlinear_time_steps=nonlinear_time_steps.astype(int)
# Time series (2) // irregular samplingda2=da1[nonlinear_time_steps]
da2.name='nonlinear'# N-point moving averagekernel1=dict(window=101, std=25)
kernel2=dict(window=9, std=2)
weights1=gaussian_weights(**kernel1)
weights2=gaussian_weights(**kernel2)
da1_smooth_points=da1.rolling(time=kernel1['window'], center=True).construct('window').dot(weights1) /weights1.sum()
da2_smooth_points=da2.rolling(time=kernel2['window'], center=True).construct('window').dot(weights2) /weights2.sum()
# N-years moving average (coordinate aware)da1_smooth_coords=coordinate_aware_gaussian_mean_dataarray(da1, std=25, window=100)
da2_smooth_coords=coordinate_aware_gaussian_mean_dataarray(da2, std=25, window=100)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
What I want to achieve
Compute a weighted moving average with respect to the coordinate instead of a simple N-point moving average.
Some more details
There exist some entries regarding weighted moving average (e.g. here or even in the
xarray
documentation), however, the examples I found are all referring to a "simple" N-point (weighted) moving average. What I'd like to do is to compute the weighted moving average taking into account the underlying coordinate, e.g. within a window of 10 days instead of 10 consecutive data points.Is there a way of doing this efficiently using
xarray
? My current workaround consists of transforming the DataArray to a pandas Dataframe, accessing and iterating over the index which is painfully slow (example code for this below).Why is this important? - An example
Figure A: Imagine a (1) regularly sampled time series and another (2) derived from irregularly sub-sampling time series (1) over a span of 1000 years (or days or whatever you want ;) ). As an example, the sub-sampling frequency is high at the beginning but becomes much coarser later on. This example is motivated from a real-world paleoclimate problem where you typically have a higher sampling for younger times and a lower temporal resolution the further back in time you go. Furthermore, a low-frequency signal on the centennial time scale may be superimposed onto a higher-frequency signal on the decadal time scale.
Figure B: Apply a moving average with a Gaussian kernel to obtain only the centennial variations. For the regularly sampled time series (1), a weighted moving average (implementation follows the example here ) works fine. However, when the sampling is irregular, as in the second time series, the result is less satisfactory because there is too little smoothing where the sampling is high (at the beginning) and too much smoothing where the sampling is low (at the end).
Figure C: The desired result is to apply a moving average using a Gaussian kernel taking into account the underlying time coordinate. Here, a Gaussian kernel with a standard deviation of 25 years is chosen, which effectively smooths out the decadal variations. The implementation, however, 1) is reliant on
pandas
, 2) is painfully slow and 3) only works for 1D arrays. So, any tips & tricks on how to overcome any (or even all) of the above would be greatly appreciated :)Minimal working example
Click me (code)
Click me (figure)
Beta Was this translation helpful? Give feedback.
All reactions