Skip to content

v4.6.0

Compare
Choose a tag to compare
@pierrot0 pierrot0 released this 02 Jun 09:21

Added

  • Support for community datasets on GCS.
  • [API] tfds.builder_from_directory and tfds.builder_from_directories, see
    https://www.tensorflow.org/datasets/external_tfrecord#directly_from_folder.
  • [API] Dash ("-") support in split names.
  • [API] file_format argument to download_and_prepare method, allowing user
    to specify an alternative file format to store prepared data (e.g. "riegeli").
  • [API] file_format to DatasetInfo string representation.
  • [API] Expose the return value of Beam pipelines. This allows for users to
    read the Beam metrics.
  • [API] Expose Feature tf_example_spec to public.
  • [API] doc kwarg on Features, to describe a feature.
  • [Documentation] Features description is shown on TFDS Catalog.
  • [Documentation] More metadata about HuggingFace datasets in TFDS catalog.
  • [Performance] Parallel load of metadata files.
  • [Testing] TFDS tests are now run using GitHub actions - misc improvements such
    as caching and sharding.
  • [Testing] Improvements to MockFs.
  • New datasets.

Changed

  • [API] num_shards is now optional in the shard name.

Removed

Fixed

  • Various datasets.
  • Dataset builders that are defined adhoc (e.g. in Colab).
  • Better DatasetNotFoundError messages.
  • Don't set deterministic on a global level but locally in interleave, so it
    only apply to interleave and not all transformations.
  • Google drive downloader.

As always, thank you to all contributors!