Replies: 1 comment 1 reply
-
Hi! You can check #2252 to find more info on this behavior. We plan to add an option to shard arrow files on save to address this. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a very large dataset with 32M examples stored as .arrow table using
save_to_disk
. When I useload_from_disk
to load this dataset the first time (i.e., the first time after a reboot for example), it's really slow and takes > 10 minutes to complete. For every subsequent call toload_to_disk
it's very fast and completes in a fraction of a second. Why does this happen? Is this due to some caching to memory? Can the cache be set to create to disk instead?Beta Was this translation helpful? Give feedback.
All reactions