Where is stored a vaex dataframe if not in RAM? #1787
-
Hi,
Thanks for sharing your insights, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Good questions.
Does this help? |
Beta Was this translation helpful? Give feedback.
Good questions.
df.executor.chunk_size_for(len(df))
will tell you the chunk size it will use, which can also be configured (see https://vaex.io/docs/conf.html )If an hdf5 file, all the data is memory mapped, which is similar to a swapfile. Only when the array gets accessed will the operating system find out if it's not in RAM, and if it's not, it will put it from disk into RAM. For parquet, the story is a bit more complicated, we iterator/scan over the datasets, with a bit of read ahead, so a few chunks are in RAM at a time. Note though, that each thread has it's own chunks, so if you have 64 cores, you have 64 chunks in RAM.
CPU cache can be ignored, it's only a speedup, but it doesn't …