Skip to content

v4.3.0

Compare
Choose a tag to compare
@tsarkov tsarkov released this 07 May 13:09
· 2749 commits to master since this release

API:
• Add dataset.info.splits['train'].num_shards to expose the number of shards to the user
• Add tfds.features.Dataset to have a field containing sub-datasets (e.g. used in RL datasets)
• Add dtype and tf.uint16 supports for tfds.features.Video
• Add DatasetInfo.license field to add redistributing information
• Better tfds.benchmark(ds) (compatible with any iterator, not just tf.data, better colab representation)

Other
• Faster tfds.as_numpy() (avoid extra tf.Tensor <> np.array copy)
• Better tfds.as_dataframe visualisation (Sequence, ragged tensor, semantic masks with use_colormap)
• (experimental) community datasets support. To allow dynamically import datasets defined outside the TFDS repository.
• (experimental) Add a hugging-face compatibility wrapper to use Hugging-face datasets directly in TFDS.
• (experimental) Riegelli format support
• (experimental) Add DatasetInfo.disable_shuffling to force examples to be read in generation order.
• Add .copy, .format methods to GPath objects
• Many bug fixes

Testing:
• Supports custom BuilderConfig in DatasetBuilderTest
DatasetBuilderTest now has a dummy_data class property which can be used in setUpClass
• Add add_tfds_id and cardinality support to tfds.testing.mock_data

And of course, many new datasets and datasets updates.

We would like to thank all the TFDS contributors!