Skip to content

Commit 75dbe6a

Browse files
author
Alex Fabbri
authored
Merge branch 'master' into opinosis
2 parents 632825c + 557b040 commit 75dbe6a

File tree

1,672 files changed

+184304
-5512
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,672 files changed

+184304
-5512
lines changed
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Add Dataset
2+
3+
* Dataset Name: <name>
4+
* Issue Reference: <link>
5+
* `dataset_info.json` Gist: <link>
6+
7+
## Description
8+
9+
<description>
10+
11+
## Checklist
12+
* [ ] Address all TODO's
13+
* [ ] Add alphabetized import to subdirectory's `__init__.py`
14+
* [ ] Run `download_and_prepare` successfully
15+
* [ ] Add checksums file
16+
* [ ] Properly cite in `BibTeX` format
17+
* [ ] Add passing test(s)
18+
* [ ] Add test data
19+
* [ ] Add data generation script (if applicable)
20+
* [ ] Lint code

README.md

Lines changed: 30 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,7 @@ to receive updates on the project.
4646
import tensorflow_datasets as tfds
4747
import tensorflow as tf
4848

49-
# tfds works in both Eager and Graph modes
50-
tf.compat.v1.enable_eager_execution()
49+
# Here we assume Eager mode is enabled (TF2), but tfds also works in Graph mode.
5150

5251
# See available datasets
5352
print(tfds.list_builders())
@@ -92,32 +91,36 @@ ds = mnist_builder.as_dataset(split='train')
9291
# dataset and its features
9392
info = mnist_builder.info
9493
print(info)
94+
```
95+
96+
This will print the dataset info content:
9597

96-
tfds.core.DatasetInfo(
97-
name='mnist',
98-
version=1.0.0,
99-
description='The MNIST database of handwritten digits.',
100-
homepage='http://yann.lecun.com/exdb/mnist/',
101-
features=FeaturesDict({
102-
'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
103-
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10)
104-
},
105-
total_num_examples=70000,
106-
splits={
107-
'test': <tfds.core.SplitInfo num_examples=10000>,
108-
'train': <tfds.core.SplitInfo num_examples=60000>
109-
},
110-
supervised_keys=('image', 'label'),
111-
citation='"""
112-
@article{lecun2010mnist,
113-
title={MNIST handwritten digit database},
114-
author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
115-
journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist},
116-
volume={2},
117-
year={2010}
118-
}
119-
"""',
120-
)
98+
```
99+
tfds.core.DatasetInfo(
100+
name='mnist',
101+
version=1.0.0,
102+
description='The MNIST database of handwritten digits.',
103+
homepage='http://yann.lecun.com/exdb/mnist/',
104+
features=FeaturesDict({
105+
'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
106+
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10)
107+
},
108+
total_num_examples=70000,
109+
splits={
110+
'test': <tfds.core.SplitInfo num_examples=10000>,
111+
'train': <tfds.core.SplitInfo num_examples=60000>
112+
},
113+
supervised_keys=('image', 'label'),
114+
citation='"""
115+
@article{lecun2010mnist,
116+
title={MNIST handwritten digit database},
117+
author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
118+
journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist},
119+
volume={2},
120+
year={2010}
121+
}
122+
"""',
123+
)
121124
```
122125

123126
You can also get details about the classes (number of classes and their names).

docs/_index.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
"from __future__ import division\n",
3838
"from __future__ import print_function\n",
3939
"\n",
40-
"import tensorflow as tf\n",
40+
"import tensorflow.compat.v2 as tf\n",
4141
"import tensorflow_datasets as tfds\n",
4242
"\n",
4343
"# tfds works in both Eager and Graph modes\n",

docs/_index.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ landing_page:
2323
<a href="./datasets">list of datasets</a>.
2424
- code_block: |
2525
<pre class = "prettyprint">
26-
import tensorflow as tf
26+
import tensorflow.compat.v2 as tf
2727
import tensorflow_datasets as tfds
2828
2929
# tfds works in both Eager and Graph modes
@@ -48,10 +48,10 @@ landing_page:
4848
items:
4949
- heading: Introducing TensorFlow Datasets
5050
image_path: /resources/images/tf-logo-card-16x9.png
51-
path: https://github.com/tensorflow/datasets/blob/master/docs/announce_proxy.md
51+
path: https://blog.tensorflow.org/2019/02/introducing-tensorflow-datasets.html
5252
buttons:
5353
- label: Read on TensorFlow Blog
54-
path: https://github.com/tensorflow/datasets/blob/master/docs/announce_proxy.md
54+
path: https://blog.tensorflow.org/2019/02/introducing-tensorflow-datasets.html
5555
- heading: TensorFlow Datasets on GitHub
5656
image_path: /resources/images/github-card-16x9.png
5757
path: https://github.com/tensorflow/datasets

docs/add_dataset.md

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,7 @@ isn't already added.
3030
* [3. Double-check the citation](#3-double-check-the-citation)
3131
* [4. Add a test](#4-add-a-test)
3232
* [5. Check your code style](#5-check-your-code-style)
33-
* [6. Add release notes](#6-add-release-notes)
34-
* [7. Send for review!](#7-send-for-review)
33+
* [6. Send for review!](#6-send-for-review)
3534
* [Define the dataset outside TFDS](#define-the-dataset-outside-tfds)
3635
* [Large datasets and distributed generation](#large-datasets-and-distributed-generation)
3736
* [Testing `MyDataset`](#testing-mydataset)
@@ -312,7 +311,7 @@ additional dependencies only as needed, use `tfds.core.lazy_imports`.
312311
To use `lazy_imports`:
313312

314313
* Add an entry for your dataset into `DATASET_EXTRAS` in
315-
[`setup.py`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/setup.py).
314+
[`setup.py`](https://github.com/tensorflow/datasets/tree/master/setup.py).
316315
This makes it so that users can do, for example, `pip install
317316
'tensorflow-datasets[svhn]'` to install the extra dependencies.
318317
* Add an entry for your import to
@@ -543,7 +542,7 @@ except TensorFlow uses 2 spaces instead of 4. Please conform to the
543542
[Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md),
544543

545544
Most importantly, use
546-
[`tensorflow_datasets/oss_scripts/lint.sh`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/oss_scripts/lint.sh)
545+
[`tensorflow_datasets/oss_scripts/lint.sh`](https://github.com/tensorflow/datasets/tree/master/oss_scripts/lint.sh)
547546
to ensure your code is properly formatted. For example, to lint the `image`
548547
directory:
549548

@@ -555,13 +554,7 @@ See
555554
[TensorFlow code style guide](https://www.tensorflow.org/community/contribute/code_style)
556555
for more information.
557556

558-
### 6. Add release notes
559-
560-
Add the dataset to the
561-
[release notes](https://github.com/tensorflow/datasets/tree/master/docs/release_notes.md).
562-
The release note will be published for the next release.
563-
564-
### 7. Send for review!
557+
### 6. Send for review!
565558

566559
Send the pull request for review.
567560

@@ -586,7 +579,7 @@ To create this checksum file the first time, you can use the
586579
`tensorflow_datasets.scripts.download_and_prepare` script and pass the flags
587580
`--register_checksums --checksums_dir=/path/to/checksums_dir`.
588581

589-
### 2. Adjust the fake example direcory
582+
### 2. Adjust the fake example directory
590583

591584
For testing, instead of using the default
592585
[fake example directory](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/testing/test_data/fake_examples)
@@ -595,7 +588,7 @@ you can define your own by setting the `EXAMPLE_DIR` property of
595588

596589
```
597590
class MyDatasetTest(tfds.testing.DatasetBuilderTestCase):
598-
EXAMPLE_DIR = 'path/to/fakedata'`
591+
EXAMPLE_DIR = 'path/to/fakedata'
599592
```
600593

601594
## Large datasets and distributed generation
@@ -617,6 +610,9 @@ as downloaded and extracted. It can be created manually or automatically with a
617610
script
618611
([example script](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/testing/cifar.py)).
619612

613+
If you're using automation to generate the test data, please include that script
614+
in [`testing`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/testing).
615+
620616
Make sure to use different data in your test data splits, as the test will
621617
fail if your dataset splits overlap.
622618

docs/api_docs/python/_toc.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ toc:
1717
path: /datasets/api_docs/python/tfds/load
1818
- title: percent
1919
path: /datasets/api_docs/python/tfds/percent
20+
- title: ReadConfig
21+
path: /datasets/api_docs/python/tfds/ReadConfig
2022
- title: show_examples
2123
path: /datasets/api_docs/python/tfds/show_examples
2224
- title: Split

docs/api_docs/python/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
* <a href="./tfds.md"><code>tfds</code></a>
66
* <a href="./tfds/download/GenerateMode.md"><code>tfds.GenerateMode</code></a>
7+
* <a href="./tfds/ReadConfig.md"><code>tfds.ReadConfig</code></a>
78
* <a href="./tfds/Split.md"><code>tfds.Split</code></a>
89
* <a href="./tfds/as_numpy.md"><code>tfds.as_numpy</code></a>
910
* <a href="./tfds/builder.md"><code>tfds.builder</code></a>

docs/api_docs/python/tfds.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ converting various units.
5353

5454
[`class GenerateMode`](./tfds/download/GenerateMode.md): `Enum` for how to treat pre-existing downloads and data.
5555

56+
[`class ReadConfig`](./tfds/ReadConfig.md): Configures input reading pipeline.
57+
5658
[`class Split`](./tfds/Split.md): `Enum` for dataset splits.
5759

5860
[`class percent`](./tfds/percent.md): Syntactic sugar for defining slice subsplits: `tfds.percent[75:-5]`.
@@ -78,4 +80,4 @@ from an image classification dataset.
7880

7981
## Other Members
8082

81-
* `__version__ = '1.3.0'` <a id="__version__"></a>
83+
* `__version__ = '1.3.2'` <a id="__version__"></a>
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
2+
<meta itemprop="name" content="tfds.ReadConfig" />
3+
<meta itemprop="path" content="Stable" />
4+
<meta itemprop="property" content="__eq__"/>
5+
<meta itemprop="property" content="__ge__"/>
6+
<meta itemprop="property" content="__gt__"/>
7+
<meta itemprop="property" content="__init__"/>
8+
<meta itemprop="property" content="__le__"/>
9+
<meta itemprop="property" content="__lt__"/>
10+
<meta itemprop="property" content="__ne__"/>
11+
</div>
12+
13+
# tfds.ReadConfig
14+
15+
<!-- Insert buttons and diff -->
16+
17+
<table class="tfo-notebook-buttons tfo-api" align="left">
18+
</table>
19+
20+
<a target="_blank" href="https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/utils/read_config.py">View
21+
source</a>
22+
23+
<!-- Equality marker -->
24+
## Class `ReadConfig`
25+
26+
Configures input reading pipeline.
27+
28+
<!-- Placeholder for "Used in" -->
29+
30+
#### Attributes:
31+
32+
* <b>`options`</b>: `tf.data.Options()`, dataset options. Those options are
33+
added to the default values defined in `tfrecord_reader.py`. Note that when
34+
`shuffle_files` is True and no seed is defined, experimental_deterministic
35+
will be set to False internally, unless it is defined here.
36+
* <b>`shuffle_seed`</b>: `tf.int64`, seeds forwarded to
37+
`tf.data.Dataset.shuffle` when `shuffle_files=True`.
38+
* <b>`shuffle_reshuffle_each_iteration`</b>: `bool`, forwarded to
39+
`tf.data.Dataset.shuffle` when `shuffle_files=True`.
40+
* <b>`interleave_parallel_reads`</b>: `int`, forwarded to
41+
`tf.data.Dataset.interleave`. Default to 16.
42+
* <b>`interleave_block_length`</b>: `int`, forwarded to
43+
`tf.data.Dataset.interleave`. Default to 16.
44+
* <b>`experimental_interleave_sort_fn`</b>: Function with signature
45+
`List[FileDict] -> List[FileDict]`, which takes the list of `dict(file: str,
46+
take: int, skip: int)` and returns the modified version to read. This can be
47+
used to sort/shuffle the shards to read in a custom order, instead of
48+
relying on `shuffle_files=True`.
49+
50+
<h2 id="__init__"><code>__init__</code></h2>
51+
52+
```python
53+
__init__(
54+
options=NOTHING,
55+
shuffle_seed=attr_dict['shuffle_seed'].default,
56+
shuffle_reshuffle_each_iteration=attr_dict['shuffle_reshuffle_each_iteration'].default,
57+
interleave_parallel_reads=attr_dict['interleave_parallel_reads'].default,
58+
interleave_block_length=attr_dict['interleave_block_length'].default,
59+
experimental_interleave_sort_fn=attr_dict['experimental_interleave_sort_fn'].default
60+
)
61+
```
62+
63+
Initialize self. See help(type(self)) for accurate signature.
64+
65+
## Methods
66+
67+
<h3 id="__eq__"><code>__eq__</code></h3>
68+
69+
```python
70+
__eq__(other)
71+
```
72+
73+
Return self==value.
74+
75+
<h3 id="__ge__"><code>__ge__</code></h3>
76+
77+
```python
78+
__ge__(other)
79+
```
80+
81+
Automatically created by attrs.
82+
83+
<h3 id="__gt__"><code>__gt__</code></h3>
84+
85+
```python
86+
__gt__(other)
87+
```
88+
89+
Automatically created by attrs.
90+
91+
<h3 id="__le__"><code>__le__</code></h3>
92+
93+
```python
94+
__le__(other)
95+
```
96+
97+
Automatically created by attrs.
98+
99+
<h3 id="__lt__"><code>__lt__</code></h3>
100+
101+
```python
102+
__lt__(other)
103+
```
104+
105+
Automatically created by attrs.
106+
107+
<h3 id="__ne__"><code>__ne__</code></h3>
108+
109+
```python
110+
__ne__(other)
111+
```
112+
113+
Check equality and either forward a NotImplemented or return the result negated.

docs/api_docs/python/tfds/Split.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,17 @@
1010

1111
# tfds.Split
1212

13-
<!-- Insert buttons -->
13+
<!-- Insert buttons and diff -->
1414

1515
<table class="tfo-notebook-buttons tfo-api" align="left">
1616
</table>
1717

1818
<a target="_blank" href="https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/splits.py">View
1919
source</a>
2020

21+
<!-- Equality marker -->
2122
## Class `Split`
2223

23-
<!-- Start diff -->
2424
`Enum` for dataset splits.
2525

2626
<!-- Placeholder for "Used in" -->

0 commit comments

Comments
 (0)