-
Below, I use a 5,000 row dataframe and the Split-Apply-Combine paradigm. Using
The output reveals a dramatic difference in speed between using an xarray groupby function (~27seconds) or a pandas groupby method (0.02 seconds). Is this still a known issue or am I not accessing flox properly? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
We haven't enabled Or go to flox.xarray.xarray_reduce(da, by, func="first") EDIT: Adding some tests to flox would be a very helpful contribution! (xarray-contrib/flox#29) |
Beta Was this translation helpful? Give feedback.
-
Deepak: Thanks for this answer. I tested with mean() to get a comparison, and you are right, mean() is a whole lot faster. In terms of going xarray -> pandas -> xarray, I have a strong preference to stay in one package for data manipulation; too much mental friction traversing packages all the time when I am not in Python on a daily basis. I will play with going to flox directly and will consider contributing tests - although I know ZERO about dask. I typically work with in-memory datasets, but really like labels :-) Here is the code and test results showing that .mean() is indeed faster than .first().
|
Beta Was this translation helpful? Give feedback.
-
Update: the latest |
Beta Was this translation helpful? Give feedback.
-
thanks for the follow-up!! It does seem to work faster. It is still magnitudes slower than pandas, but the acheived time reduction from the older flox versions does dramatically increase the usability of this workflow.
|
Beta Was this translation helpful? Give feedback.
We haven't enabled
first
,last
withflox
yet, I need to think about how to do it with dask. I would just use pandas since you can.Or go to
flox
directly if you really want array support (it'll work for numpy arrays; andnanfirst
will work for numpy and dask)EDIT: Adding some tests to flox would be a very helpful contribution! (xarray-contrib/flox#29)