Replies: 1 comment
-
We don't have an official API to push data to the Hub (or remote storage) on the fly without caching it beforehand. But it shouldn't be too hard to implement manually with |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
concerning https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DatasetBuilder.download_and_prepare
Hi,
as I understand https://huggingface.co/docs/datasets/filesystems#download-and-prepare-a-dataset-into-a-cloud-storage the mentioned function first loads the full dataset to the cache then processes it and uploads it to the cloud storage. Is there a way to do this process batch-wise, so that I do not have to e.g. load the full https://huggingface.co/datasets/wiki_dpr/discussions?status=open&type=discussion into my cache at once?
I might be missing something here?
(Beyond that is there a best practice to load the above mentioned dataset to a cloud storage?)
thanks
Beta Was this translation helpful? Give feedback.
All reactions