Replies: 4 comments 3 replies
-
Is the schema the same amongst the different files? |
Beta Was this translation helpful? Give feedback.
-
Just to add a bit of info on how the arrow files were generated. I made an elasticsearch query and stored the results as a pandas dataframe (
This was repeated for different ES query time periods. This was done and completed first using a separate script, before I tried to open the arrow files to store as a vaex dataframe. |
Beta Was this translation helpful? Give feedback.
-
Strange, could you make a reproducible issue, like generate some data and export that, and see how long that takes for you, so we can try the same? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the suggestion on another thread to try exporting the arrow file to hdf5. I tried that, and I can now open the file in less than 300ms, and the memory usage seems to be minimal too. I'll convert all my arrow files to hdf5 then. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have multiple
.arrow
files, each about 1GB (total filesize is larger than my RAM). I tried to open all of them usingvaex.open_many()
to read them into a single dataframe, and saw that the memory usage was increasing, and it was taking longer than I expected.So I tried just opening one file using the code below.
What I noticed was it takes about 4-5 seconds to open the file, and the free memory (as indicated by the
free
column returned by the commandfree -h
) kept decreasing until it was ~1GB lesser.I thought that when opening the arrow files, vaex would use memory-mapping and thus, won't actually use up so much memory, and it would also be faster. Is my understanding correct, or am I doing something wrong?
ETA: Based on the documentation, I thought the file would open instantly. If I time the cell using
%time
, it does return in microseconds, but the cell continues to run for a few seconds, as shown by%%time
.Beta Was this translation helpful? Give feedback.
All reactions