v4.6.0
Added
- Support for community datasets on GCS.
- [API]
tfds.builder_from_directory
andtfds.builder_from_directories
, see
https://www.tensorflow.org/datasets/external_tfrecord#directly_from_folder. - [API] Dash ("-") support in split names.
- [API]
file_format
argument todownload_and_prepare
method, allowing user
to specify an alternative file format to store prepared data (e.g. "riegeli"). - [API]
file_format
toDatasetInfo
string representation. - [API] Expose the return value of Beam pipelines. This allows for users to
read the Beam metrics. - [API] Expose Feature
tf_example_spec
to public. - [API]
doc
kwarg onFeature
s, to describe a feature. - [Documentation] Features description is shown on TFDS Catalog.
- [Documentation] More metadata about HuggingFace datasets in TFDS catalog.
- [Performance] Parallel load of metadata files.
- [Testing] TFDS tests are now run using GitHub actions - misc improvements such
as caching and sharding. - [Testing] Improvements to MockFs.
- New datasets.
Changed
- [API]
num_shards
is now optional in the shard name.
Removed
- TFDS pathlib API, migrated to a self-contained
etils.epath
(see
https://github.com/google/etils).
Fixed
- Various datasets.
- Dataset builders that are defined adhoc (e.g. in Colab).
- Better
DatasetNotFoundError
messages. - Don't set
deterministic
on a global level but locally in interleave, so it
only apply to interleave and not all transformations. - Google drive downloader.
As always, thank you to all contributors!