Release v4.3.0 · tensorflow/datasets

API:
• Add dataset.info.splits['train'].num_shards to expose the number of shards to the user
• Add tfds.features.Dataset to have a field containing sub-datasets (e.g. used in RL datasets)
• Add dtype and tf.uint16 supports for tfds.features.Video
• Add DatasetInfo.license field to add redistributing information
• Better tfds.benchmark(ds) (compatible with any iterator, not just tf.data, better colab representation)

Other
• Faster tfds.as_numpy() (avoid extra tf.Tensor <> np.array copy)
• Better tfds.as_dataframe visualisation (Sequence, ragged tensor, semantic masks with use_colormap)
• (experimental) community datasets support. To allow dynamically import datasets defined outside the TFDS repository.
• (experimental) Add a hugging-face compatibility wrapper to use Hugging-face datasets directly in TFDS.
• (experimental) Riegelli format support
• (experimental) Add DatasetInfo.disable_shuffling to force examples to be read in generation order.
• Add .copy, .format methods to GPath objects
• Many bug fixes

Testing:
• Supports custom BuilderConfig in DatasetBuilderTest
• DatasetBuilderTest now has a dummy_data class property which can be used in setUpClass
• Add add_tfds_id and cardinality support to tfds.testing.mock_data

And of course, many new datasets and datasets updates.

We would like to thank all the TFDS contributors!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.3.0