Skip to content

Commit 2d24ef6

Browse files
committed
Merge branch 'master' of https://github.com/tensorflow/datasets into document_dataset_bug
2 parents c0ae7b5 + 813b5d2 commit 2d24ef6

File tree

143 files changed

+3306
-444
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

143 files changed

+3306
-444
lines changed

docs/api_docs/python/tfds/_api_cache.json

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,6 @@
139139
"tfds.core": false,
140140
"tfds.core.BeamBasedBuilder": false,
141141
"tfds.core.BeamBasedBuilder.BUILDER_CONFIGS": true,
142-
"tfds.core.BeamBasedBuilder.GOOGLE_DISABLED": true,
143-
"tfds.core.BeamBasedBuilder.IN_DEVELOPMENT": true,
144142
"tfds.core.BeamBasedBuilder.SUPPORTED_VERSIONS": true,
145143
"tfds.core.BeamBasedBuilder.VERSION": true,
146144
"tfds.core.BeamBasedBuilder.__init__": true,
@@ -160,8 +158,6 @@
160158
"tfds.core.BuilderConfig.version": true,
161159
"tfds.core.DatasetBuilder": false,
162160
"tfds.core.DatasetBuilder.BUILDER_CONFIGS": true,
163-
"tfds.core.DatasetBuilder.GOOGLE_DISABLED": true,
164-
"tfds.core.DatasetBuilder.IN_DEVELOPMENT": true,
165161
"tfds.core.DatasetBuilder.SUPPORTED_VERSIONS": true,
166162
"tfds.core.DatasetBuilder.VERSION": true,
167163
"tfds.core.DatasetBuilder.__init__": true,
@@ -197,10 +193,9 @@
197193
"tfds.core.DatasetInfo.write_to_directory": true,
198194
"tfds.core.Experiment": false,
199195
"tfds.core.Experiment.DUMMY": true,
196+
"tfds.core.Experiment.S3": true,
200197
"tfds.core.GeneratorBasedBuilder": false,
201198
"tfds.core.GeneratorBasedBuilder.BUILDER_CONFIGS": true,
202-
"tfds.core.GeneratorBasedBuilder.GOOGLE_DISABLED": true,
203-
"tfds.core.GeneratorBasedBuilder.IN_DEVELOPMENT": true,
204199
"tfds.core.GeneratorBasedBuilder.SUPPORTED_VERSIONS": true,
205200
"tfds.core.GeneratorBasedBuilder.VERSION": true,
206201
"tfds.core.GeneratorBasedBuilder.__init__": true,
@@ -598,10 +593,10 @@
598593
"tfds.testing.DatasetBuilderTestCase.DATASET_CLASS": true,
599594
"tfds.testing.DatasetBuilderTestCase.DL_EXTRACT_RESULT": true,
600595
"tfds.testing.DatasetBuilderTestCase.EXAMPLE_DIR": true,
601-
"tfds.testing.DatasetBuilderTestCase.INTERNAL_DATASET": true,
602596
"tfds.testing.DatasetBuilderTestCase.MOCK_MONARCH": true,
603597
"tfds.testing.DatasetBuilderTestCase.MOCK_OUT_FORBIDDEN_OS_FUNCTIONS": true,
604598
"tfds.testing.DatasetBuilderTestCase.OVERLAPPING_SPLITS": true,
599+
"tfds.testing.DatasetBuilderTestCase.VERSION": true,
605600
"tfds.testing.DatasetBuilderTestCase.__call__": true,
606601
"tfds.testing.DatasetBuilderTestCase.__eq__": true,
607602
"tfds.testing.DatasetBuilderTestCase.__init__": true,
@@ -743,8 +738,6 @@
743738
"tfds.testing.DatasetBuilderTestCase.test_session": true,
744739
"tfds.testing.DummyDatasetSharedGenerator": false,
745740
"tfds.testing.DummyDatasetSharedGenerator.BUILDER_CONFIGS": true,
746-
"tfds.testing.DummyDatasetSharedGenerator.GOOGLE_DISABLED": true,
747-
"tfds.testing.DummyDatasetSharedGenerator.IN_DEVELOPMENT": true,
748741
"tfds.testing.DummyDatasetSharedGenerator.SUPPORTED_VERSIONS": true,
749742
"tfds.testing.DummyDatasetSharedGenerator.VERSION": true,
750743
"tfds.testing.DummyDatasetSharedGenerator.__init__": true,
@@ -758,8 +751,6 @@
758751
"tfds.testing.DummyDatasetSharedGenerator.version": true,
759752
"tfds.testing.DummyMnist": false,
760753
"tfds.testing.DummyMnist.BUILDER_CONFIGS": true,
761-
"tfds.testing.DummyMnist.GOOGLE_DISABLED": true,
762-
"tfds.testing.DummyMnist.IN_DEVELOPMENT": true,
763754
"tfds.testing.DummyMnist.SUPPORTED_VERSIONS": true,
764755
"tfds.testing.DummyMnist.VERSION": true,
765756
"tfds.testing.DummyMnist.__init__": true,
@@ -1197,7 +1188,6 @@
11971188
"tfds.units.MiB": true,
11981189
"tfds.units.PiB": true,
11991190
"tfds.units.TiB": true,
1200-
"tfds.units.absolute_import": true,
12011191
"tfds.units.division": true,
12021192
"tfds.units.print_function": true,
12031193
"tfds.units.size_str": false

docs/api_docs/python/tfds/core/BeamBasedBuilder.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@
99
<meta itemprop="property" content="as_dataset"/>
1010
<meta itemprop="property" content="download_and_prepare"/>
1111
<meta itemprop="property" content="BUILDER_CONFIGS"/>
12-
<meta itemprop="property" content="GOOGLE_DISABLED"/>
13-
<meta itemprop="property" content="IN_DEVELOPMENT"/>
1412
<meta itemprop="property" content="SUPPORTED_VERSIONS"/>
1513
<meta itemprop="property" content="VERSION"/>
1614
<meta itemprop="property" content="builder_configs"/>
@@ -80,7 +78,8 @@ as_dataset(
8078
split=None,
8179
batch_size=None,
8280
shuffle_files=None,
83-
as_supervised=False
81+
as_supervised=False,
82+
in_memory=None
8483
)
8584
```
8685

@@ -105,6 +104,10 @@ Callers must pass arguments as keyword arguments.
105104
will have a 2-tuple structure `(input, label)` according to
106105
`builder.info.supervised_keys`. If `False`, the default, the returned
107106
`tf.data.Dataset` will have a dictionary with all the features.
107+
* <b>`in_memory`</b>: `bool`, if `True`, loads the dataset in memory which
108+
increases iteration speeds. Note that if `True` and the dataset has unknown
109+
dimensions, the features will be padded to the maximum size across the
110+
dataset.
108111

109112
#### Returns:
110113

@@ -142,8 +145,6 @@ Downloads and prepares dataset for reading.
142145
## Class Members
143146

144147
* `BUILDER_CONFIGS` <a id="BUILDER_CONFIGS"></a>
145-
* `GOOGLE_DISABLED = False` <a id="GOOGLE_DISABLED"></a>
146-
* `IN_DEVELOPMENT = False` <a id="IN_DEVELOPMENT"></a>
147148
* `SUPPORTED_VERSIONS` <a id="SUPPORTED_VERSIONS"></a>
148149
* `VERSION = None` <a id="VERSION"></a>
149150
* `builder_configs` <a id="builder_configs"></a>

docs/api_docs/python/tfds/core/DatasetBuilder.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@
99
<meta itemprop="property" content="as_dataset"/>
1010
<meta itemprop="property" content="download_and_prepare"/>
1111
<meta itemprop="property" content="BUILDER_CONFIGS"/>
12-
<meta itemprop="property" content="GOOGLE_DISABLED"/>
13-
<meta itemprop="property" content="IN_DEVELOPMENT"/>
1412
<meta itemprop="property" content="SUPPORTED_VERSIONS"/>
1513
<meta itemprop="property" content="VERSION"/>
1614
<meta itemprop="property" content="builder_configs"/>
@@ -111,7 +109,8 @@ as_dataset(
111109
split=None,
112110
batch_size=None,
113111
shuffle_files=None,
114-
as_supervised=False
112+
as_supervised=False,
113+
in_memory=None
115114
)
116115
```
117116

@@ -136,6 +135,10 @@ Callers must pass arguments as keyword arguments.
136135
will have a 2-tuple structure `(input, label)` according to
137136
`builder.info.supervised_keys`. If `False`, the default, the returned
138137
`tf.data.Dataset` will have a dictionary with all the features.
138+
* <b>`in_memory`</b>: `bool`, if `True`, loads the dataset in memory which
139+
increases iteration speeds. Note that if `True` and the dataset has unknown
140+
dimensions, the features will be padded to the maximum size across the
141+
dataset.
139142

140143
#### Returns:
141144

@@ -173,8 +176,6 @@ Downloads and prepares dataset for reading.
173176
## Class Members
174177

175178
* `BUILDER_CONFIGS` <a id="BUILDER_CONFIGS"></a>
176-
* `GOOGLE_DISABLED = False` <a id="GOOGLE_DISABLED"></a>
177-
* `IN_DEVELOPMENT = False` <a id="IN_DEVELOPMENT"></a>
178179
* `SUPPORTED_VERSIONS` <a id="SUPPORTED_VERSIONS"></a>
179180
* `VERSION = None` <a id="VERSION"></a>
180181
* `builder_configs` <a id="builder_configs"></a>

docs/api_docs/python/tfds/core/Experiment.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
<meta itemprop="name" content="tfds.core.Experiment" />
33
<meta itemprop="path" content="Stable" />
44
<meta itemprop="property" content="DUMMY"/>
5+
<meta itemprop="property" content="S3"/>
56
</div>
67

78
# tfds.core.Experiment
@@ -30,3 +31,4 @@ tfds.core.Experiment.EXP_A: True, })
3031
## Class Members
3132

3233
* `DUMMY` <a id="DUMMY"></a>
34+
* `S3` <a id="S3"></a>

docs/api_docs/python/tfds/core/GeneratorBasedBuilder.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@
99
<meta itemprop="property" content="as_dataset"/>
1010
<meta itemprop="property" content="download_and_prepare"/>
1111
<meta itemprop="property" content="BUILDER_CONFIGS"/>
12-
<meta itemprop="property" content="GOOGLE_DISABLED"/>
13-
<meta itemprop="property" content="IN_DEVELOPMENT"/>
1412
<meta itemprop="property" content="SUPPORTED_VERSIONS"/>
1513
<meta itemprop="property" content="VERSION"/>
1614
<meta itemprop="property" content="builder_configs"/>
@@ -89,7 +87,8 @@ as_dataset(
8987
split=None,
9088
batch_size=None,
9189
shuffle_files=None,
92-
as_supervised=False
90+
as_supervised=False,
91+
in_memory=None
9392
)
9493
```
9594

@@ -114,6 +113,10 @@ Callers must pass arguments as keyword arguments.
114113
will have a 2-tuple structure `(input, label)` according to
115114
`builder.info.supervised_keys`. If `False`, the default, the returned
116115
`tf.data.Dataset` will have a dictionary with all the features.
116+
* <b>`in_memory`</b>: `bool`, if `True`, loads the dataset in memory which
117+
increases iteration speeds. Note that if `True` and the dataset has unknown
118+
dimensions, the features will be padded to the maximum size across the
119+
dataset.
117120

118121
#### Returns:
119122

@@ -151,8 +154,6 @@ Downloads and prepares dataset for reading.
151154
## Class Members
152155

153156
* `BUILDER_CONFIGS` <a id="BUILDER_CONFIGS"></a>
154-
* `GOOGLE_DISABLED = False` <a id="GOOGLE_DISABLED"></a>
155-
* `IN_DEVELOPMENT = False` <a id="IN_DEVELOPMENT"></a>
156157
* `SUPPORTED_VERSIONS` <a id="SUPPORTED_VERSIONS"></a>
157158
* `VERSION = None` <a id="VERSION"></a>
158159
* `builder_configs` <a id="builder_configs"></a>

docs/api_docs/python/tfds/disable_progress_bar.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,13 @@ tfds.disable_progress_bar()
1414
Defined in
1515
[`core/utils/tqdm_utils.py`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/utils/tqdm_utils.py).
1616

17-
<!-- Placeholder for "Used in" -->
17+
### Used in the tutorials:
18+
19+
* [CycleGAN](https://www.tensorflow.org/beta/tutorials/generative/cyclegan)
20+
* [Distributed training with Keras](https://www.tensorflow.org/beta/tutorials/distribute/keras)
21+
* [Multi-worker Training with Estimator](https://www.tensorflow.org/beta/tutorials/distribute/multi_worker_with_estimator)
22+
* [Multi-worker Training with Keras](https://www.tensorflow.org/beta/tutorials/distribute/multi_worker_with_keras)
23+
* [Transfer Learning Using Pretrained ConvNets](https://www.tensorflow.org/beta/tutorials/images/transfer_learning)
1824

1925
#### Usage:
2026

docs/api_docs/python/tfds/features/Sequence.md

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -43,16 +43,32 @@ Note that `SequenceDict` do not support features which are of type
4343
tfds.features.Sequence(tfds.features.Image(), length=NB_FRAME)
4444
```
4545

46-
or: `tfds.features.Sequence({ 'frame': tfds.features.Image(shape=(64, 64, 3))
47-
'action': tfds.features.ClassLabel(['up', 'down', 'left', 'right']) },
48-
length=NB_FRAME)`
46+
or:
4947

50-
During data generation: `yield { 'frame': np.ones(shape=(NB_FRAME, 64, 64, 3)),
51-
'action': ['left', 'left', 'up', ...], }`
48+
```
49+
tfds.features.Sequence({
50+
'frame': tfds.features.Image(shape=(64, 64, 3))
51+
'action': tfds.features.ClassLabel(['up', 'down', 'left', 'right'])
52+
}, length=NB_FRAME)
53+
```
54+
55+
During data generation:
5256

53-
Tensor returned by `.as_dataset()`: `{ 'frame': tf.Tensor(shape=(NB_FRAME, 64,
54-
64, 3), dtype=tf.uint8), 'action': tf.Tensor(shape=(NB_FRAME,), dtype=tf.int64),
55-
}`
57+
```
58+
yield {
59+
'frame': np.ones(shape=(NB_FRAME, 64, 64, 3)),
60+
'action': ['left', 'left', 'up', ...],
61+
}
62+
```
63+
64+
Tensor returned by `.as_dataset()`:
65+
66+
```
67+
{
68+
'frame': tf.Tensor(shape=(NB_FRAME, 64, 64, 3), dtype=tf.uint8),
69+
'action': tf.Tensor(shape=(NB_FRAME,), dtype=tf.int64),
70+
}
71+
```
5672

5773
At generation time, you can specify a list of features dict, a dict of list
5874
values or a stacked numpy array. The lists will automatically be distributed

docs/api_docs/python/tfds/features/text/SubwordTextEncoder.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@ Inherits From: [`TextEncoder`](../../../tfds/features/text/TextEncoder.md)
2323

2424
Defined in [`core/features/text/subword_text_encoder.py`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/features/text/subword_text_encoder.py).
2525

26-
<!-- Placeholder for "Used in" -->
26+
### Used in the tutorials:
27+
28+
* [Transformer model for language understanding](https://www.tensorflow.org/beta/tutorials/text/transformer)
2729

2830
Encoding is fully invertible because all out-of-vocab wordpieces are
2931
byte-encoded.

docs/api_docs/python/tfds/features/text/TokenTextEncoder.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,9 @@ Inherits From: [`TextEncoder`](../../../tfds/features/text/TextEncoder.md)
2525

2626
Defined in [`core/features/text/text_encoder.py`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/features/text/text_encoder.py).
2727

28-
<!-- Placeholder for "Used in" -->
28+
### Used in the tutorials:
29+
30+
* [Load text with tf.data](https://www.tensorflow.org/beta/tutorials/load_data/text)
2931

3032
Tokenization splits on (and drops) non-alphanumeric characters with
3133
regex "\W+".
@@ -44,8 +46,10 @@ __init__(
4446

4547
Constructs a TokenTextEncoder.
4648

47-
To load from a file saved with `TokenTextEncoder.save_to_file`, use
48-
`TokenTextEncoder.load_from_file`.
49+
To load from a file saved with
50+
<a href="../../../tfds/features/text/TokenTextEncoder.md#save_to_file"><code>TokenTextEncoder.save_to_file</code></a>,
51+
use
52+
<a href="../../../tfds/features/text/TokenTextEncoder.md#load_from_file"><code>TokenTextEncoder.load_from_file</code></a>.
4953

5054
#### Args:
5155

docs/api_docs/python/tfds/features/text/Tokenizer.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,9 @@ Splits a string into tokens, and joins them back.
1818

1919
Defined in [`core/features/text/text_encoder.py`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/features/text/text_encoder.py).
2020

21-
<!-- Placeholder for "Used in" -->
21+
### Used in the tutorials:
2222

23+
* [Load text with tf.data](https://www.tensorflow.org/beta/tutorials/load_data/text)
2324

2425
<h2 id="__init__"><code>__init__</code></h2>
2526

0 commit comments

Comments
 (0)