Skip to content

Commit 6643cf2

Browse files
Merge branch 'master' into master
2 parents 9e11471 + 9dd9d66 commit 6643cf2

File tree

1,903 files changed

+187984
-8735
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,903 files changed

+187984
-8735
lines changed

.github/pull_request_template.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Add Dataset
2+
3+
* Dataset Name: <name>
4+
* Issue Reference: <link>
5+
* `dataset_info.json` Gist: <link>
6+
7+
## Description
8+
9+
<description>
10+
11+
## Checklist
12+
* [ ] Address all TODO's
13+
* [ ] Add alphabetized import to subdirectory's `__init__.py`
14+
* [ ] Run `download_and_prepare` successfully
15+
* [ ] Add checksums file
16+
* [ ] Properly cite in `BibTeX` format
17+
* [ ] Add passing test(s)
18+
* [ ] Add test data
19+
* [ ] Add data generation script (if applicable)
20+
* [ ] Lint code

README.md

Lines changed: 31 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,7 @@ to receive updates on the project.
4646
import tensorflow_datasets as tfds
4747
import tensorflow as tf
4848

49-
# tfds works in both Eager and Graph modes
50-
tf.compat.v1.enable_eager_execution()
49+
# Here we assume Eager mode is enabled (TF2), but tfds also works in Graph mode.
5150

5251
# See available datasets
5352
print(tfds.list_builders())
@@ -92,32 +91,36 @@ ds = mnist_builder.as_dataset(split='train')
9291
# dataset and its features
9392
info = mnist_builder.info
9493
print(info)
94+
```
95+
96+
This will print the dataset info content:
9597

96-
tfds.core.DatasetInfo(
97-
name='mnist',
98-
version=1.0.0,
99-
description='The MNIST database of handwritten digits.',
100-
homepage='http://yann.lecun.com/exdb/mnist/',
101-
features=FeaturesDict({
102-
'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
103-
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10)
104-
},
105-
total_num_examples=70000,
106-
splits={
107-
'test': <tfds.core.SplitInfo num_examples=10000>,
108-
'train': <tfds.core.SplitInfo num_examples=60000>
109-
},
110-
supervised_keys=('image', 'label'),
111-
citation='"""
112-
@article{lecun2010mnist,
113-
title={MNIST handwritten digit database},
114-
author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
115-
journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist},
116-
volume={2},
117-
year={2010}
118-
}
119-
"""',
120-
)
98+
```
99+
tfds.core.DatasetInfo(
100+
name='mnist',
101+
version=1.0.0,
102+
description='The MNIST database of handwritten digits.',
103+
homepage='http://yann.lecun.com/exdb/mnist/',
104+
features=FeaturesDict({
105+
'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
106+
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10)
107+
},
108+
total_num_examples=70000,
109+
splits={
110+
'test': <tfds.core.SplitInfo num_examples=10000>,
111+
'train': <tfds.core.SplitInfo num_examples=60000>
112+
},
113+
supervised_keys=('image', 'label'),
114+
citation='"""
115+
@article{lecun2010mnist,
116+
title={MNIST handwritten digit database},
117+
author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
118+
journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist},
119+
volume={2},
120+
year={2010}
121+
}
122+
"""',
123+
)
121124
```
122125

123126
You can also get details about the classes (number of classes and their names).
@@ -142,7 +145,7 @@ input pipelines with `tf.data` but use whatever you'd like for your model
142145
components.
143146

144147
```python
145-
train_ds = tfds.load("mnist", split=tfds.Split.TRAIN)
148+
train_ds = tfds.load("mnist", split="train")
146149
train_ds = train_ds.shuffle(1024).batch(128).repeat(5).prefetch(10)
147150
for example in tfds.as_numpy(train_ds):
148151
numpy_images, numpy_labels = example["image"], example["label"]

docs/_book.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ upper_tabs:
2727
path: /datasets/add_dataset
2828
- title: Feature decoding
2929
path: /datasets/decode
30-
- title: Add huge datasets
30+
- title: Add huge datasets (Apache Beam)
3131
path: /datasets/beam_datasets
3232
- title: Store your dataset on GCS
3333
path: /datasets/gcs

docs/_index.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
"from __future__ import division\n",
3838
"from __future__ import print_function\n",
3939
"\n",
40-
"import tensorflow as tf\n",
40+
"import tensorflow.compat.v2 as tf\n",
4141
"import tensorflow_datasets as tfds\n",
4242
"\n",
4343
"# tfds works in both Eager and Graph modes\n",
@@ -47,7 +47,7 @@
4747
"print(tfds.list_builders())\n",
4848
"\n",
4949
"# Construct a tf.data.Dataset\n",
50-
"dataset = tfds.load(name=\"mnist\", split=tfds.Split.TRAIN)\n",
50+
"dataset = tfds.load(name=\"mnist\", split=\"train\")\n",
5151
"\n",
5252
"# Build your input pipeline\n",
5353
"dataset = dataset.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)\n",

docs/_index.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ landing_page:
2323
<a href="./datasets">list of datasets</a>.
2424
- code_block: |
2525
<pre class = "prettyprint">
26-
import tensorflow as tf
26+
import tensorflow.compat.v2 as tf
2727
import tensorflow_datasets as tfds
2828
2929
# tfds works in both Eager and Graph modes
@@ -33,7 +33,7 @@ landing_page:
3333
print(tfds.list_builders())
3434
3535
# Construct a tf.data.Dataset
36-
dataset = tfds.load(name="mnist", split=tfds.Split.TRAIN)
36+
dataset = tfds.load(name="mnist", split="train")
3737
3838
# Build your input pipeline
3939
dataset = dataset.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
@@ -48,10 +48,10 @@ landing_page:
4848
items:
4949
- heading: Introducing TensorFlow Datasets
5050
image_path: /resources/images/tf-logo-card-16x9.png
51-
path: https://github.com/tensorflow/datasets/blob/master/docs/announce_proxy.md
51+
path: https://blog.tensorflow.org/2019/02/introducing-tensorflow-datasets.html
5252
buttons:
5353
- label: Read on TensorFlow Blog
54-
path: https://github.com/tensorflow/datasets/blob/master/docs/announce_proxy.md
54+
path: https://blog.tensorflow.org/2019/02/introducing-tensorflow-datasets.html
5555
- heading: TensorFlow Datasets on GitHub
5656
image_path: /resources/images/github-card-16x9.png
5757
path: https://github.com/tensorflow/datasets

docs/add_dataset.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ See our [list of datasets](catalog/overview.md) to see if the dataset you want
66
isn't already added.
77

88
* [Overview](#overview)
9-
* [Writing `my_dataset.py`](#writing-my-datasetpy)
9+
* [Writing `my_dataset.py`](#writing-my_datasetpy)
1010
* [Use the default template](#use-the-default-template)
11-
* [DatasetBuilder](#datasetbuilde)
12-
* [my_dataset.py](#my-datasetpy)
11+
* [DatasetBuilder](#datasetbuilder)
12+
* [my_dataset.py](#my_datasetpy)
1313
* [Specifying `DatasetInfo`](#specifying-datasetinfo)
1414
* [`FeatureConnector`s](#featureconnectors)
1515
* [Downloading and extracting source data](#downloading-and-extracting-source-data)
@@ -30,8 +30,7 @@ isn't already added.
3030
* [3. Double-check the citation](#3-double-check-the-citation)
3131
* [4. Add a test](#4-add-a-test)
3232
* [5. Check your code style](#5-check-your-code-style)
33-
* [6. Add release notes](#6-add-release-notes)
34-
* [7. Send for review!](#7-send-for-review)
33+
* [6. Send for review!](#6-send-for-review)
3534
* [Define the dataset outside TFDS](#define-the-dataset-outside-tfds)
3635
* [Large datasets and distributed generation](#large-datasets-and-distributed-generation)
3736
* [Testing `MyDataset`](#testing-mydataset)
@@ -543,7 +542,7 @@ except TensorFlow uses 2 spaces instead of 4. Please conform to the
543542
[Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md),
544543

545544
Most importantly, use
546-
[`tensorflow_datasets/oss_scripts/lint.sh`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/oss_scripts/lint.sh)
545+
[`tensorflow_datasets/oss_scripts/lint.sh`](https://github.com/tensorflow/datasets/tree/master/oss_scripts/lint.sh)
547546
to ensure your code is properly formatted. For example, to lint the `image`
548547
directory:
549548

@@ -555,16 +554,14 @@ See
555554
[TensorFlow code style guide](https://www.tensorflow.org/community/contribute/code_style)
556555
for more information.
557556

558-
### 6. Add release notes
559-
560-
Add the dataset to the
561-
[release notes](https://github.com/tensorflow/datasets/tree/master/docs/release_notes.md).
562-
The release note will be published for the next release.
563-
564-
### 7. Send for review!
557+
### 6. Send for review!
565558

566559
Send the pull request for review.
567560

561+
When creating the pull request, fill in the areas for the name, issue reference,
562+
and GitHub Gist link. When using the checklist, replace each `[ ]` with `[x]` to
563+
mark it off.
564+
568565

569566
## Define the dataset outside TFDS.
570567

@@ -586,7 +583,7 @@ To create this checksum file the first time, you can use the
586583
`tensorflow_datasets.scripts.download_and_prepare` script and pass the flags
587584
`--register_checksums --checksums_dir=/path/to/checksums_dir`.
588585

589-
### 2. Adjust the fake example direcory
586+
### 2. Adjust the fake example directory
590587

591588
For testing, instead of using the default
592589
[fake example directory](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/testing/test_data/fake_examples)
@@ -595,7 +592,7 @@ you can define your own by setting the `EXAMPLE_DIR` property of
595592

596593
```
597594
class MyDatasetTest(tfds.testing.DatasetBuilderTestCase):
598-
EXAMPLE_DIR = 'path/to/fakedata'`
595+
EXAMPLE_DIR = 'path/to/fakedata'
599596
```
600597

601598
## Large datasets and distributed generation
@@ -617,6 +614,9 @@ as downloaded and extracted. It can be created manually or automatically with a
617614
script
618615
([example script](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/testing/cifar.py)).
619616

617+
If you're using automation to generate the test data, please include that script
618+
in [`testing`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/testing).
619+
620620
Make sure to use different data in your test data splits, as the test will
621621
fail if your dataset splits overlap.
622622

docs/api_docs/python/_toc.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ toc:
1717
path: /datasets/api_docs/python/tfds/load
1818
- title: percent
1919
path: /datasets/api_docs/python/tfds/percent
20+
- title: ReadConfig
21+
path: /datasets/api_docs/python/tfds/ReadConfig
2022
- title: show_examples
2123
path: /datasets/api_docs/python/tfds/show_examples
2224
- title: Split

docs/api_docs/python/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
* <a href="./tfds.md"><code>tfds</code></a>
66
* <a href="./tfds/download/GenerateMode.md"><code>tfds.GenerateMode</code></a>
7+
* <a href="./tfds/ReadConfig.md"><code>tfds.ReadConfig</code></a>
78
* <a href="./tfds/Split.md"><code>tfds.Split</code></a>
89
* <a href="./tfds/as_numpy.md"><code>tfds.as_numpy</code></a>
910
* <a href="./tfds/builder.md"><code>tfds.builder</code></a>

docs/api_docs/python/tfds.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ converting various units.
5353

5454
[`class GenerateMode`](./tfds/download/GenerateMode.md): `Enum` for how to treat pre-existing downloads and data.
5555

56+
[`class ReadConfig`](./tfds/ReadConfig.md): Configures input reading pipeline.
57+
5658
[`class Split`](./tfds/Split.md): `Enum` for dataset splits.
5759

5860
[`class percent`](./tfds/percent.md): Syntactic sugar for defining slice subsplits: `tfds.percent[75:-5]`.
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
2+
<meta itemprop="name" content="tfds.ReadConfig" />
3+
<meta itemprop="path" content="Stable" />
4+
<meta itemprop="property" content="__eq__"/>
5+
<meta itemprop="property" content="__ge__"/>
6+
<meta itemprop="property" content="__gt__"/>
7+
<meta itemprop="property" content="__init__"/>
8+
<meta itemprop="property" content="__le__"/>
9+
<meta itemprop="property" content="__lt__"/>
10+
<meta itemprop="property" content="__ne__"/>
11+
</div>
12+
13+
# tfds.ReadConfig
14+
15+
<!-- Insert buttons and diff -->
16+
17+
<table class="tfo-notebook-buttons tfo-api" align="left">
18+
</table>
19+
20+
<a target="_blank" href="https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/utils/read_config.py">View
21+
source</a>
22+
23+
## Class `ReadConfig`
24+
25+
Configures input reading pipeline.
26+
27+
<!-- Placeholder for "Used in" -->
28+
29+
#### Attributes:
30+
31+
* <b>`options`</b>: `tf.data.Options()`, dataset options. Those options are
32+
added to the default values defined in `tfrecord_reader.py`. Note that when
33+
`shuffle_files` is True and no seed is defined, experimental_deterministic
34+
will be set to False internally, unless it is defined here.
35+
* <b>`shuffle_seed`</b>: `tf.int64`, seeds forwarded to
36+
`tf.data.Dataset.shuffle` when `shuffle_files=True`.
37+
* <b>`shuffle_reshuffle_each_iteration`</b>: `bool`, forwarded to
38+
`tf.data.Dataset.shuffle` when `shuffle_files=True`.
39+
* <b>`interleave_parallel_reads`</b>: `int`, forwarded to
40+
`tf.data.Dataset.interleave`. Default to 16.
41+
* <b>`interleave_block_length`</b>: `int`, forwarded to
42+
`tf.data.Dataset.interleave`. Default to 16.
43+
* <b>`experimental_interleave_sort_fn`</b>: Function with signature
44+
`List[FileDict] -> List[FileDict]`, which takes the list of `dict(file: str,
45+
take: int, skip: int)` and returns the modified version to read. This can be
46+
used to sort/shuffle the shards to read in a custom order, instead of
47+
relying on `shuffle_files=True`.
48+
49+
<h2 id="__init__"><code>__init__</code></h2>
50+
51+
```python
52+
__init__(
53+
options=NOTHING,
54+
shuffle_seed=attr_dict['shuffle_seed'].default,
55+
shuffle_reshuffle_each_iteration=attr_dict['shuffle_reshuffle_each_iteration'].default,
56+
interleave_parallel_reads=attr_dict['interleave_parallel_reads'].default,
57+
interleave_block_length=attr_dict['interleave_block_length'].default,
58+
experimental_interleave_sort_fn=attr_dict['experimental_interleave_sort_fn'].default
59+
)
60+
```
61+
62+
Initialize self. See help(type(self)) for accurate signature.
63+
64+
## Methods
65+
66+
<h3 id="__eq__"><code>__eq__</code></h3>
67+
68+
```python
69+
__eq__(other)
70+
```
71+
72+
Return self==value.
73+
74+
<h3 id="__ge__"><code>__ge__</code></h3>
75+
76+
```python
77+
__ge__(other)
78+
```
79+
80+
Automatically created by attrs.
81+
82+
<h3 id="__gt__"><code>__gt__</code></h3>
83+
84+
```python
85+
__gt__(other)
86+
```
87+
88+
Automatically created by attrs.
89+
90+
<h3 id="__le__"><code>__le__</code></h3>
91+
92+
```python
93+
__le__(other)
94+
```
95+
96+
Automatically created by attrs.
97+
98+
<h3 id="__lt__"><code>__lt__</code></h3>
99+
100+
```python
101+
__lt__(other)
102+
```
103+
104+
Automatically created by attrs.
105+
106+
<h3 id="__ne__"><code>__ne__</code></h3>
107+
108+
```python
109+
__ne__(other)
110+
```
111+
112+
Check equality and either forward a NotImplemented or return the result negated.

0 commit comments

Comments
 (0)