Replies: 3 comments 1 reply
-
Note that xarray isn't very good with storing sparse data. You quickly run into memory issues I noticed. I believe it is possible to initialize it with a scipy.sparse matrix, but this is not really meant to be used that way (and working directly with scipy and dictionaries is often more straightforward). Storing non-null exchanges in a dataframe makes a lot more sense IMO (but since I do not really have a good mental picture of the prototype in question, I may be wrong). |
Beta Was this translation helpful? Give feedback.
-
Other candidate: datatable. Their pitch: "a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum speed possible". It seems faster than polars on some operations. |
Beta Was this translation helpful? Give feedback.
-
This feels like premature optimization. Let's build a data API which allows data to be returned as Numpy arrays (both normal and recordarrays, DataFrames, and plain Python or Javascript objects), and build something more complicated only when we really need it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Here are some Pandas alternatives:
There are also Big Data alternatives like Spark, Snowflake, Databricks, and dbt.
We don't have great criteria for choosing between these alternatives. This might be one case where we postpone making a decision.
Beta Was this translation helpful? Give feedback.
All reactions