Skip to content

Commit 8cea22f

Browse files
cyfracopybara-github
authored andcommitted
Document which datasets require manual download.
PiperOrigin-RevId: 282954909
1 parent 90b9a17 commit 8cea22f

26 files changed

+1128
-50
lines changed

docs/catalog/_toc.yaml

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ toc:
99
title: Audio
1010
- section:
1111
- path: /datasets/catalog/abstract_reasoning
12-
title: abstract_reasoning
12+
title: abstract_reasoning (manual)
1313
- path: /datasets/catalog/aflw2k3d
1414
title: aflw2k3d
1515
- path: /datasets/catalog/bigearthnet
@@ -33,7 +33,7 @@ toc:
3333
- path: /datasets/catalog/celeb_a
3434
title: celeb_a
3535
- path: /datasets/catalog/celeb_a_hq
36-
title: celeb_a_hq
36+
title: celeb_a_hq (manual)
3737
- path: /datasets/catalog/cifar10
3838
title: cifar10
3939
- path: /datasets/catalog/cifar100
@@ -55,13 +55,13 @@ toc:
5555
- path: /datasets/catalog/colorectal_histology_large
5656
title: colorectal_histology_large
5757
- path: /datasets/catalog/curated_breast_imaging_ddsm
58-
title: curated_breast_imaging_ddsm
58+
title: curated_breast_imaging_ddsm (manual)
5959
- path: /datasets/catalog/cycle_gan
6060
title: cycle_gan
6161
- path: /datasets/catalog/deep_weeds
6262
title: deep_weeds
6363
- path: /datasets/catalog/diabetic_retinopathy_detection
64-
title: diabetic_retinopathy_detection
64+
title: diabetic_retinopathy_detection (manual)
6565
- path: /datasets/catalog/dmlab
6666
title: dmlab
6767
- path: /datasets/catalog/downsampled_imagenet
@@ -85,11 +85,11 @@ toc:
8585
- path: /datasets/catalog/i_naturalist2017
8686
title: i_naturalist2017
8787
- path: /datasets/catalog/image_label_folder
88-
title: image_label_folder
88+
title: image_label_folder (manual)
8989
- path: /datasets/catalog/imagenet2012
90-
title: imagenet2012
90+
title: imagenet2012 (manual)
9191
- path: /datasets/catalog/imagenet2012_corrupted
92-
title: imagenet2012_corrupted
92+
title: imagenet2012_corrupted (manual)
9393
- path: /datasets/catalog/imagenet_resized
9494
title: imagenet_resized
9595
- path: /datasets/catalog/kmnist
@@ -127,7 +127,7 @@ toc:
127127
- path: /datasets/catalog/quickdraw_bitmap
128128
title: quickdraw_bitmap
129129
- path: /datasets/catalog/resisc45
130-
title: resisc45
130+
title: resisc45 (manual)
131131
- path: /datasets/catalog/rock_paper_scissors
132132
title: rock_paper_scissors
133133
- path: /datasets/catalog/scene_parse150
@@ -193,15 +193,15 @@ toc:
193193
- path: /datasets/catalog/multi_news
194194
title: multi_news
195195
- path: /datasets/catalog/newsroom
196-
title: newsroom
196+
title: newsroom (manual)
197197
- path: /datasets/catalog/reddit_tifu
198198
title: reddit_tifu
199199
- path: /datasets/catalog/scientific_papers
200200
title: scientific_papers
201201
- path: /datasets/catalog/wikihow
202-
title: wikihow
202+
title: wikihow (manual)
203203
- path: /datasets/catalog/xsum
204-
title: xsum
204+
title: xsum (manual)
205205
title: Summarization
206206
- section:
207207
- path: /datasets/catalog/c4
@@ -247,19 +247,19 @@ toc:
247247
- path: /datasets/catalog/ted_multi_translate
248248
title: ted_multi_translate
249249
- path: /datasets/catalog/wmt14_translate
250-
title: wmt14_translate
250+
title: wmt14_translate (manual)
251251
- path: /datasets/catalog/wmt15_translate
252-
title: wmt15_translate
252+
title: wmt15_translate (manual)
253253
- path: /datasets/catalog/wmt16_translate
254-
title: wmt16_translate
254+
title: wmt16_translate (manual)
255255
- path: /datasets/catalog/wmt17_translate
256-
title: wmt17_translate
256+
title: wmt17_translate (manual)
257257
- path: /datasets/catalog/wmt18_translate
258-
title: wmt18_translate
258+
title: wmt18_translate (manual)
259259
- path: /datasets/catalog/wmt19_translate
260-
title: wmt19_translate
260+
title: wmt19_translate (manual)
261261
- path: /datasets/catalog/wmt_t2t_translate
262-
title: wmt_t2t_translate
262+
title: wmt_t2t_translate (manual)
263263
title: Translate
264264
- section:
265265
- path: /datasets/catalog/bair_robot_pushing_small

docs/catalog/abstract_reasoning.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
<meta itemprop="citation" content="@InProceedings{pmlr-v80-barrett18a,&#10; title = {Measuring abstract reasoning in neural networks},&#10; author = {Barrett, David and Hill, Felix and Santoro, Adam and Morcos, Ari and Lillicrap, Timothy},&#10; booktitle = {Proceedings of the 35th International Conference on Machine Learning},&#10; pages = {511--520},&#10; year = {2018},&#10; editor = {Dy, Jennifer and Krause, Andreas},&#10; volume = {80},&#10; series = {Proceedings of Machine Learning Research},&#10; address = {Stockholmsmassan, Stockholm Sweden},&#10; month = {10--15 Jul},&#10; publisher = {PMLR},&#10; pdf = {http://proceedings.mlr.press/v80/barrett18a/barrett18a.pdf},&#10; url = {http://proceedings.mlr.press/v80/barrett18a.html},&#10; abstract = {Whether neural networks can learn abstract reasoning or whetherthey merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation 'regimes' in which the training data and test questions differ in clearly-defined ways. We show that popular models such as ResNets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with structure designed to encourage reasoning, that does significantly better. When we vary the way in which the test questions and training data differ, we find that our model is notably proficient at certain forms of generalisation, but notably weak at others. We further show that the model's ability to generalise improves markedly if it is trained to predict symbolic explanations for its answers. Altogether, we introduce and explore ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset should motivate further progress in this direction.}&#10;}&#10;" />
1111
</div>
1212

13-
# `abstract_reasoning`
13+
# `abstract_reasoning` (Manual download)
1414

1515
Procedurally Generated Matrices (PGM) data from the paper Measuring Abstract
1616
Reasoning in Neural Networks, Barrett, Hill, Santoro et al. 2018. The goal is to
@@ -113,10 +113,18 @@ Versions:
113113

114114
* **`0.0.2`** (default):
115115

116+
WARNING: This dataset requires you to download the source data manually into
117+
manual_dir (defaults to `~/tensorflow_datasets/manual/abstract_reasoning/`):
118+
Data can be downloaded from
119+
https://console.cloud.google.com/storage/browser/ravens-matrices Please put all
120+
the tar.gz files in manual_dir.
121+
116122
### Statistics
123+
117124
None computed
118125

119126
### Features
127+
120128
```python
121129
FeaturesDict({
122130
'answers': Video(Image(shape=(160, 160, 1), dtype=tf.uint8)),
@@ -133,6 +141,7 @@ FeaturesDict({
133141
* [https://github.com/deepmind/abstract-reasoning-matrices](https://github.com/deepmind/abstract-reasoning-matrices)
134142

135143
## `abstract_reasoning/interpolation`
144+
136145
As in the neutral split, $S$ consisted of any \
137146
triples $[r, o, a]$. For interpolation, in the training set, when the \
138147
attribute was "colour" or "size" (i.e., the ordered attributes), the values of \
@@ -145,6 +154,12 @@ Versions:
145154

146155
* **`0.0.2`** (default):
147156

157+
WARNING: This dataset requires you to download the source data manually into
158+
manual_dir (defaults to `~/tensorflow_datasets/manual/abstract_reasoning/`):
159+
Data can be downloaded from
160+
https://console.cloud.google.com/storage/browser/ravens-matrices Please put all
161+
the tar.gz files in manual_dir.
162+
148163
### Statistics
149164

150165
None computed
@@ -176,6 +191,12 @@ Versions:
176191

177192
* **`0.0.2`** (default):
178193

194+
WARNING: This dataset requires you to download the source data manually into
195+
manual_dir (defaults to `~/tensorflow_datasets/manual/abstract_reasoning/`):
196+
Data can be downloaded from
197+
https://console.cloud.google.com/storage/browser/ravens-matrices Please put all
198+
the tar.gz files in manual_dir.
199+
179200
### Statistics
180201
None computed
181202

@@ -208,6 +229,12 @@ Versions:
208229

209230
* **`0.0.2`** (default):
210231

232+
WARNING: This dataset requires you to download the source data manually into
233+
manual_dir (defaults to `~/tensorflow_datasets/manual/abstract_reasoning/`):
234+
Data can be downloaded from
235+
https://console.cloud.google.com/storage/browser/ravens-matrices Please put all
236+
the tar.gz files in manual_dir.
237+
211238
### Statistics
212239
None computed
213240

@@ -228,6 +255,7 @@ FeaturesDict({
228255
* [https://github.com/deepmind/abstract-reasoning-matrices](https://github.com/deepmind/abstract-reasoning-matrices)
229256

230257
## `abstract_reasoning/attr.rels`
258+
231259
In our dataset, there are 29 possible unique \
232260
triples $[r,o,a]$. We allocated seven of these for the test set, at random, \
233261
but such that each of the attributes was represented exactly once in this set. \
@@ -238,6 +266,12 @@ Versions:
238266

239267
* **`0.0.2`** (default):
240268

269+
WARNING: This dataset requires you to download the source data manually into
270+
manual_dir (defaults to `~/tensorflow_datasets/manual/abstract_reasoning/`):
271+
Data can be downloaded from
272+
https://console.cloud.google.com/storage/browser/ravens-matrices Please put all
273+
the tar.gz files in manual_dir.
274+
241275
### Statistics
242276
None computed
243277

@@ -270,6 +304,12 @@ Versions:
270304

271305
* **`0.0.2`** (default):
272306

307+
WARNING: This dataset requires you to download the source data manually into
308+
manual_dir (defaults to `~/tensorflow_datasets/manual/abstract_reasoning/`):
309+
Data can be downloaded from
310+
https://console.cloud.google.com/storage/browser/ravens-matrices Please put all
311+
the tar.gz files in manual_dir.
312+
273313
### Statistics
274314
None computed
275315

@@ -299,6 +339,12 @@ Versions:
299339

300340
* **`0.0.2`** (default):
301341

342+
WARNING: This dataset requires you to download the source data manually into
343+
manual_dir (defaults to `~/tensorflow_datasets/manual/abstract_reasoning/`):
344+
Data can be downloaded from
345+
https://console.cloud.google.com/storage/browser/ravens-matrices Please put all
346+
the tar.gz files in manual_dir.
347+
302348
### Statistics
303349
None computed
304350

@@ -328,6 +374,12 @@ Versions:
328374

329375
* **`0.0.2`** (default):
330376

377+
WARNING: This dataset requires you to download the source data manually into
378+
manual_dir (defaults to `~/tensorflow_datasets/manual/abstract_reasoning/`):
379+
Data can be downloaded from
380+
https://console.cloud.google.com/storage/browser/ravens-matrices Please put all
381+
the tar.gz files in manual_dir.
382+
331383
### Statistics
332384
None computed
333385

docs/catalog/celeb_a_hq.md

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@
22
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
33
<meta itemprop="name" content="TensorFlow Datasets" />
44
</div>
5+
56
<meta itemprop="name" content="celeb_a_hq" />
67
<meta itemprop="description" content="High-quality version of the CELEBA&#10;dataset, consisting of 30000 images in 1024 x 1024 resolution.&#10;&#10;WARNING: This dataset currently requires you to prepare images on your own.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load('celeb_a_hq', split='train')&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
78
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/celeb_a_hq" />
89
<meta itemprop="sameAs" content="https://github.com/tkarras/progressive_growing_of_gans" />
910
<meta itemprop="citation" content="@article{DBLP:journals/corr/abs-1710-10196,&#10; author = {Tero Karras and&#10; Timo Aila and&#10; Samuli Laine and&#10; Jaakko Lehtinen},&#10; title = {Progressive Growing of GANs for Improved Quality, Stability, and Variation},&#10; journal = {CoRR},&#10; volume = {abs/1710.10196},&#10; year = {2017},&#10; url = {http://arxiv.org/abs/1710.10196},&#10; archivePrefix = {arXiv},&#10; eprint = {1710.10196},&#10; timestamp = {Mon, 13 Aug 2018 16:46:42 +0200},&#10; biburl = {https://dblp.org/rec/bib/journals/corr/abs-1710-10196},&#10; bibsource = {dblp computer science bibliography, https://dblp.org}&#10;}&#10;" />
1011
</div>
11-
# `celeb_a_hq`
12+
13+
# `celeb_a_hq` (Manual download)
1214

1315
High-quality version of the CELEBA dataset, consisting of 30000 images in 1024 x
1416
1024 resolution.
@@ -54,6 +56,12 @@ Versions:
5456
* **`0.1.0`** (default):
5557
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
5658

59+
WARNING: This dataset requires you to download the source data manually into
60+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
61+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
62+
data1024x1024.tar). Detailed instructions are here:
63+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
64+
5765
### Statistics
5866

5967
Split | Examples
@@ -81,6 +89,12 @@ Versions:
8189
* **`0.1.0`** (default):
8290
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
8391

92+
WARNING: This dataset requires you to download the source data manually into
93+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
94+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
95+
data1024x1024.tar). Detailed instructions are here:
96+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
97+
8498
### Statistics
8599

86100
Split | Examples
@@ -108,6 +122,12 @@ Versions:
108122
* **`0.1.0`** (default):
109123
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
110124

125+
WARNING: This dataset requires you to download the source data manually into
126+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
127+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
128+
data1024x1024.tar). Detailed instructions are here:
129+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
130+
111131
### Statistics
112132

113133
Split | Examples
@@ -135,6 +155,12 @@ Versions:
135155
* **`0.1.0`** (default):
136156
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
137157

158+
WARNING: This dataset requires you to download the source data manually into
159+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
160+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
161+
data1024x1024.tar). Detailed instructions are here:
162+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
163+
138164
### Statistics
139165

140166
Split | Examples
@@ -162,6 +188,12 @@ Versions:
162188
* **`0.1.0`** (default):
163189
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
164190

191+
WARNING: This dataset requires you to download the source data manually into
192+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
193+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
194+
data1024x1024.tar). Detailed instructions are here:
195+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
196+
165197
### Statistics
166198

167199
Split | Examples
@@ -189,6 +221,12 @@ Versions:
189221
* **`0.1.0`** (default):
190222
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
191223

224+
WARNING: This dataset requires you to download the source data manually into
225+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
226+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
227+
data1024x1024.tar). Detailed instructions are here:
228+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
229+
192230
### Statistics
193231

194232
Split | Examples
@@ -216,6 +254,12 @@ Versions:
216254
* **`0.1.0`** (default):
217255
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
218256

257+
WARNING: This dataset requires you to download the source data manually into
258+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
259+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
260+
data1024x1024.tar). Detailed instructions are here:
261+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
262+
219263
### Statistics
220264

221265
Split | Examples
@@ -243,6 +287,12 @@ Versions:
243287
* **`0.1.0`** (default):
244288
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
245289

290+
WARNING: This dataset requires you to download the source data manually into
291+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
292+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
293+
data1024x1024.tar). Detailed instructions are here:
294+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
295+
246296
### Statistics
247297

248298
Split | Examples
@@ -270,6 +320,12 @@ Versions:
270320
* **`0.1.0`** (default):
271321
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
272322

323+
WARNING: This dataset requires you to download the source data manually into
324+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
325+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
326+
data1024x1024.tar). Detailed instructions are here:
327+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
328+
273329
### Statistics
274330

275331
Split | Examples
@@ -297,6 +353,12 @@ Versions:
297353
* **`0.1.0`** (default):
298354
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
299355

356+
WARNING: This dataset requires you to download the source data manually into
357+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
358+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
359+
data1024x1024.tar). Detailed instructions are here:
360+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
361+
300362
### Statistics
301363

302364
Split | Examples
@@ -324,6 +386,12 @@ Versions:
324386
* **`0.1.0`** (default):
325387
* `2.0.0`: New split API (https://tensorflow.org/datasets/splits)
326388

389+
WARNING: This dataset requires you to download the source data manually into
390+
manual_dir (defaults to `~/tensorflow_datasets/manual/celeb_a_hq/`): manual_dir
391+
should contain multiple tar files with images (data2x2.tar, data4x4.tar ..
392+
data1024x1024.tar). Detailed instructions are here:
393+
https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training
394+
327395
### Statistics
328396

329397
Split | Examples

0 commit comments

Comments
 (0)