Skip to content

Where is stored a vaex dataframe if not in RAM? #1787

Answered by maartenbreddels
yohplala asked this question in Q&A
Discussion options

You must be logged in to vote

Good questions.

df.executor.chunk_size_for(len(df)) will tell you the chunk size it will use, which can also be configured (see https://vaex.io/docs/conf.html )
If an hdf5 file, all the data is memory mapped, which is similar to a swapfile. Only when the array gets accessed will the operating system find out if it's not in RAM, and if it's not, it will put it from disk into RAM. For parquet, the story is a bit more complicated, we iterator/scan over the datasets, with a bit of read ahead, so a few chunks are in RAM at a time. Note though, that each thread has it's own chunks, so if you have 64 cores, you have 64 chunks in RAM.
CPU cache can be ignored, it's only a speedup, but it doesn't …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@yohplala
Comment options

Answer selected by yohplala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants