You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Data] Avoid unnecessary conversion to Numpy when creating Arrow/Pandas blocks (#51238)
Context
---
This change skips unnecessary blanket conversion to Numpy (applied to
every chunk of data) before converting to Pyarrow.
That creates challenges when batches contain Arrow native `Scalars`
which because of that are ultimately being serialized as
`ArrowPythonObjectType` extension.
Changes
---
We revisit following conversion aspects and convert to Numpy passed in
column values only in following cases:
- Column name is `TENSOR_COLUMN_NAME` (for compatibility)
- Provided column values are already represented by a tensor (either
numpy, torch, etc)
- Provided column values is a list of ndarrays (we do this for
compatibility with previously existing behavior where all column values
were blindly converted to Numpy leading to list of ndarrays being
converted a tensor)
---------
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
0 commit comments