v4.3.0
API:
• Add dataset.info.splits['train'].num_shards
to expose the number of shards to the user
• Add tfds.features.Dataset
to have a field containing sub-datasets (e.g. used in RL datasets)
• Add dtype and tf.uint16
supports for tfds.features.Video
• Add DatasetInfo.license
field to add redistributing information
• Better tfds.benchmark(ds)
(compatible with any iterator, not just tf.data
, better colab representation)
Other
• Faster tfds.as_numpy() (avoid extra tf.Tensor <> np.array copy)
• Better tfds.as_dataframe
visualisation (Sequence, ragged tensor, semantic masks with use_colormap
)
• (experimental) community datasets support. To allow dynamically import datasets defined outside the TFDS repository.
• (experimental) Add a hugging-face compatibility wrapper to use Hugging-face datasets directly in TFDS.
• (experimental) Riegelli format support
• (experimental) Add DatasetInfo.disable_shuffling
to force examples to be read in generation order.
• Add .copy
, .format
methods to GPath objects
• Many bug fixes
Testing:
• Supports custom BuilderConfig
in DatasetBuilderTest
• DatasetBuilderTest
now has a dummy_data
class property which can be used in setUpClass
• Add add_tfds_id
and cardinality support to tfds.testing.mock_data
And of course, many new datasets and datasets updates.
We would like to thank all the TFDS contributors!