Using IterableDataset with Torch DataLoader throws error. #2577
Replies: 1 comment 1 reply
-
PyTorch only recognize
Or you can customize with something like this:
Reference: datasets/src/datasets/iterable_dataset.py Lines 480 to 494 in c722810 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a general question regarding using Dataset in Streaming mode – Is IterableDataset not to be used with Pytorch DataLoader? I can use Dataset with the DataLoader without any issues (as is also mentioned in the examples), but I cannot do so with the former. I am quite new to the HF Dataset library so my apologies if this is already mentioned somewhere (I am still looking).
I get the following error, which makes sense because this is streaming mode, but I am unclear about how to design so that I can do batching then:
File “/data/leshekha/lib/HFDatasets/lib/python3.6/site-packages/torch/utils/data/sampler.py”, line 67, in iter
return iter(range(len(self.data_source)))
TypeError: object of type ‘IterableDataset’ has no len()
Any help is appreciated. Thank you.
I have asked the same question here: https://discuss.huggingface.co/t/roadmap-timeline-for-dataset-streaming/6789/5
Beta Was this translation helpful? Give feedback.
All reactions