Replies: 1 comment 3 replies
-
You can use def gen_from_pa_dataset(pa_dataset):
for pa_batch in pa_dataset.to_batches():
yield from pa_batch.to_pylist()
ds = Dataset.from_generator(gen_from_pa_dataset, gen_kwargs={"pa_dataset": train_ds}) But this use case is too specific to have a dedicated packaged builder, at least for now. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
Is it possible to infer the label from the directory structure for tabular datasets with many files (in either parquet or csv format)?
I see that it's possible for audio and image datasets, but I haven't managed to do the same for tabular ones.
Also, I tried with
pyarrow
tabular datasets and it works, but then of course you don't have the automatictrain
,test
,validation
identification:Thanks a lot!
Beta Was this translation helpful? Give feedback.
All reactions