Skip to content

Commit a6a75e0

Browse files
authored
Merge branch 'master' into add-emnist-dataset
2 parents 976bb82 + 96fe8c3 commit a6a75e0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+1814
-158
lines changed

.github/ISSUE_TEMPLATE/dataset-request.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,6 @@ assignees: ''
1212
* License of dataset: <license type>
1313
* Short description of dataset and use case(s): <description>
1414

15-
Folks who would also like to see this dataset in `tensorflow/datasets`, please +1/thumbs-up so the developers can know which requests to prioritize.
15+
Folks who would also like to see this dataset in `tensorflow/datasets`, please thumbs-up so the developers can know which requests to prioritize.
16+
17+
And if you'd like to contribute the dataset (thank you!), see our [guide to adding a dataset](https://github.com/tensorflow/datasets/blob/master/docs/add_dataset.md).

CONTRIBUTING.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,33 @@
11
# How to Contribute
22

3+
Thanks for thinking about contributing to our library !
4+
5+
6+
## Before you start
7+
* Please accept the [Contributor License Agreement](https://cla.developers.google.com) (see below)
8+
* [Ask here](https://github.com/tensorflow/datasets/issues/142) to be added to
9+
the list of collaborators so that issues can be assigned to you.
10+
* Comment on the issue that you plan to work on so we can assign it to you and
11+
there isn't unnecessary duplication of work.
12+
* When you plan to work on something larger (for example, adding new
13+
`FeatureConnectors`), please respond on the issue (or create one if there
14+
isn't one) to explain your plan and give others a chance to discuss.
15+
* If you're fixing some smaller issue - please check the list of
16+
[pending Pull Requests](https://github.com/tensorflow/datasets/pulls) to
17+
avoid unnecessary duplication.
18+
19+
20+
## How you can help:
21+
22+
You can help in multiple ways:
23+
24+
* Adding new datasets and/or requested features (see the [issues](https://github.com/tensorflow/datasets/issues))
25+
* Reproducing bugs reported by others: This helps us **a lot**.
26+
* Doing code reviews on the Pull Requests from the community.
27+
* Verifying that Pull Requests from others are working correctly
28+
(especially the ones that add new datasets).
29+
30+
331
## Datasets
432

533
Adding a public dataset to `tensorflow-datasets` is a great way of making it
@@ -42,7 +70,7 @@ require:
4270
*Note that tests for DatasetBuilders are different and are documented in the*
4371
*[guide to add a dataset](https://github.com/tensorflow/datasets/tree/master/docs/add_dataset.md#testing-mydataset).*
4472

45-
# Pull Requests
73+
## Pull Requests
4674

4775
All contributions are done through Pull Requests here on GitHub.
4876

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# TensorFlow Datasets
22

3-
TensorFlow Datasets provides many public datasets as `tf.data.Dataset`s.
3+
TensorFlow Datasets provides many public datasets as `tf.data.Datasets`.
44

55
[![Kokoro](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.svg)](https://storage.googleapis.com/tfds-kokoro-public/kokoro-build.html)
66
[![PyPI version](https://badge.fury.io/py/tensorflow-datasets.svg)](https://badge.fury.io/py/tensorflow-datasets)
@@ -77,7 +77,7 @@ mnist_builder = tfds.builder("mnist")
7777
mnist_builder.download_and_prepare()
7878

7979
# Construct a tf.data.Dataset
80-
dataset = mnist_builder.as_dataset(split=tfds.Split.TRAIN)
80+
ds = mnist_builder.as_dataset(split=tfds.Split.TRAIN)
8181

8282
# Get the `DatasetInfo` object, which contains useful information about the
8383
# dataset and its features
@@ -132,9 +132,9 @@ You can also use `tfds.as_numpy` in conjunction with `batch_size=-1` to
132132
get the full dataset in NumPy arrays from the returned `tf.Tensor` object:
133133
134134
```python
135-
train_data = tfds.load("mnist", split=tfds.Split.TRAIN, batch_size=-1)
136-
numpy_data = tfds.as_numpy(train_data)
137-
numpy_images, numpy_labels = numpy_dataset["image"], numpy_dataset["label"]
135+
train_ds = tfds.load("mnist", split=tfds.Split.TRAIN, batch_size=-1)
136+
numpy_ds = tfds.as_numpy(train_ds)
137+
numpy_images, numpy_labels = numpy_ds["image"], numpy_ds["label"]
138138
```
139139
140140
Note that the library still requires `tensorflow` as an internal dependency.

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@
55
* [API Documentation](https://www.tensorflow.org/datasets/api_docs/python/tfds)
66
* [Splits](splits.md)
77
* [Adding a new dataset](add_dataset.md)
8+
* [Using Google Cloud Storage to cache preprocessed data](gcs.md)

docs/add_dataset.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -496,12 +496,12 @@ to be updated.
496496
dataset. It uses "fake examples" as test data that mimic the structure of the
497497
source dataset.
498498

499-
The test data should be put in in
499+
The test data should be put in
500500
[`testing/test_data/fake_examples/`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/testing/test_data/fake_examples/)
501501
under the `my_dataset` directory and should mimic the source dataset artifacts
502502
as downloaded and extracted. It can be created manually or automatically with a
503-
script ([example
504-
script](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/testing/cifar.py)).
503+
script
504+
([example script](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/testing/cifar.py)).
505505

506506
Make sure to use different data in your test data splits, as the test will
507507
fail if your dataset splits overlap.

0 commit comments

Comments
 (0)