Skip to content

Speed-up suggestion - only slice datasets once #323

@dfulu

Description

@dfulu

Currently we slice each of our datasets in time and in space using different calls to DataArray.sel() or DataArray.isel(). These .sel() methods are slow and I saw somewhere (though can't find it right now) that xarray recommends combining selections to only do one if possible.

I tried this locally, and messily and got a speed up in the Dataset object of 5-8%. This is a nice speed up, but introduces a little more complexity to our code. Locally I needed to adapt the select_time_slice[_nwp]() functions in order to return dict[str, slice] objects for use with isel() instead of .sel() as we do now. I needed to store these and make the select_spatial_slice() function return dict[str, slice] objects. I then combined these space time slices and applied .isel() once.

This adds some complexity to the code compared to our current way of doing it where we have separate space and time slicing functions which return dataarrays. But the way we do it now comes at a performance cost.

So is it worth to refactor to a single slice, or is the cleaner code worth it to stay using the method we have now?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions