Skip to content

Commit 61282d4

Browse files
Conchylicultorcopybara-github
authored andcommitted
Automated documentation update
PiperOrigin-RevId: 284045163
1 parent 92a562e commit 61282d4

28 files changed

+2140
-22
lines changed

docs/catalog/_toc.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ toc:
205205
title: Summarization
206206
- section:
207207
- path: /datasets/catalog/c4
208-
title: c4
208+
title: c4 (manual)
209209
- path: /datasets/catalog/definite_pronoun_resolution
210210
title: definite_pronoun_resolution
211211
- path: /datasets/catalog/esnli
@@ -224,6 +224,8 @@ toc:
224224
title: multi_nli
225225
- path: /datasets/catalog/multi_nli_mismatch
226226
title: multi_nli_mismatch
227+
- path: /datasets/catalog/scicite
228+
title: scicite
227229
- path: /datasets/catalog/snli
228230
title: snli
229231
- path: /datasets/catalog/squad

docs/catalog/billsum.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,12 @@ summary.
2121
[https://github.com/FiscalNote/BillSum](https://github.com/FiscalNote/BillSum)
2222
* `DatasetBuilder`:
2323
[`tfds.summarization.billsum.Billsum`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/summarization/billsum.py)
24-
* Version: `v2.0.0`
24+
* Version: `v3.0.0`
2525
* Versions:
2626

27-
* **`2.0.0`** (default):
27+
* **`3.0.0`** (default):
2828

29-
* Size: `65.79 MiB`
29+
* Size: `64.14 MiB`
3030

3131
## Features
3232
```python
@@ -41,9 +41,9 @@ FeaturesDict({
4141

4242
Split | Examples
4343
:------ | -------:
44-
ALL | 24,116
45-
TRAIN | 19,447
46-
TEST | 3,432
44+
ALL | 23,455
45+
TRAIN | 18,949
46+
TEST | 3,269
4747
CA_TEST | 1,237
4848

4949
## Homepage

docs/catalog/c4.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@
22
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
33
<meta itemprop="name" content="TensorFlow Datasets" />
44
</div>
5+
56
<meta itemprop="name" content="c4" />
67
<meta itemprop="description" content="A colossal, cleaned version of Common Crawl's web crawl corpus.&#10;&#10;Based on Common Crawl dataset: &quot;https://commoncrawl.org&quot;&#10;&#10;Due to the overhead of cleaning the dataset, it is recommend you prepare it with&#10;a distributed service like Cloud Dataflow. More info at&#10;https://www.tensorflow.org/datasets/beam_datasets.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load('c4', split='train')&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
78
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/c4" />
89
<meta itemprop="sameAs" content="https://github.com/google-research/text-to-text-transfer-transformer#datasets" />
910
<meta itemprop="citation" content="&#10;@article{2019t5,&#10; author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},&#10; title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},&#10; journal = {arXiv e-prints},&#10; year = {2019},&#10; archivePrefix = {arXiv},&#10; eprint = {1910.10683},&#10;}&#10;" />
1011
</div>
11-
# `c4`
12+
13+
# `c4` (Manual download)
1214

1315
A colossal, cleaned version of Common Crawl's web crawl corpus.
1416

@@ -48,6 +50,14 @@ Versions:
4850
* `1.0.0`: None
4951
* `1.0.1`: None
5052

53+
WARNING: This dataset requires you to download the source data manually into
54+
manual_dir (defaults to `~/tensorflow_datasets/manual/c4/`): For the
55+
WebText-like config, you must manually download 'OpenWebText.zip' (from
56+
https://mega.nz/#F!EZZD0YwJ!9_PlEQzdMVLaNdKv_ICNVQ) and the Common Crawl WET
57+
files from August 2018 to July 2019
58+
(https://commoncrawl.org/the-data/get-started/) and place them in the
59+
`manual_dir`.
60+
5161
### Statistics
5262
None computed
5363

@@ -75,6 +85,14 @@ Versions:
7585
* `1.0.0`: None
7686
* `1.0.1`: None
7787

88+
WARNING: This dataset requires you to download the source data manually into
89+
manual_dir (defaults to `~/tensorflow_datasets/manual/c4/`): For the
90+
WebText-like config, you must manually download 'OpenWebText.zip' (from
91+
https://mega.nz/#F!EZZD0YwJ!9_PlEQzdMVLaNdKv_ICNVQ) and the Common Crawl WET
92+
files from August 2018 to July 2019
93+
(https://commoncrawl.org/the-data/get-started/) and place them in the
94+
`manual_dir`.
95+
7896
### Statistics
7997
None computed
8098

@@ -104,6 +122,14 @@ Versions:
104122
* `1.0.0`: None
105123
* `1.0.1`: None
106124

125+
WARNING: This dataset requires you to download the source data manually into
126+
manual_dir (defaults to `~/tensorflow_datasets/manual/c4/`): For the
127+
WebText-like config, you must manually download 'OpenWebText.zip' (from
128+
https://mega.nz/#F!EZZD0YwJ!9_PlEQzdMVLaNdKv_ICNVQ) and the Common Crawl WET
129+
files from August 2018 to July 2019
130+
(https://commoncrawl.org/the-data/get-started/) and place them in the
131+
`manual_dir`.
132+
107133
### Statistics
108134
None computed
109135

@@ -133,6 +159,14 @@ Versions:
133159
* `1.0.0`: None
134160
* `1.0.1`: None
135161

162+
WARNING: This dataset requires you to download the source data manually into
163+
manual_dir (defaults to `~/tensorflow_datasets/manual/c4/`): For the
164+
WebText-like config, you must manually download 'OpenWebText.zip' (from
165+
https://mega.nz/#F!EZZD0YwJ!9_PlEQzdMVLaNdKv_ICNVQ) and the Common Crawl WET
166+
files from August 2018 to July 2019
167+
(https://commoncrawl.org/the-data/get-started/) and place them in the
168+
`manual_dir`.
169+
136170
### Statistics
137171
None computed
138172

docs/catalog/gigaword.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,10 @@ There are two features: - document: article. - summary: headline.
2121
[https://github.com/harvardnlp/sent-summary](https://github.com/harvardnlp/sent-summary)
2222
* `DatasetBuilder`:
2323
[`tfds.summarization.gigaword.Gigaword`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/summarization/gigaword.py)
24-
* Version: `v1.1.0`
24+
* Version: `v1.2.0`
2525
* Versions:
2626

27-
* **`1.1.0`** (default):
27+
* **`1.2.0`** (default):
2828

2929
* Size: `551.61 MiB`
3030

docs/catalog/overview.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,7 @@ np_datasets = tfds.as_numpy(datasets)
146146
* [`math_dataset`](math_dataset.md)
147147
* [`multi_nli`](multi_nli.md)
148148
* [`multi_nli_mismatch`](multi_nli_mismatch.md)
149+
* [`scicite`](scicite.md)
149150
* [`snli`](snli.md)
150151
* [`squad`](squad.md)
151152
* [`super_glue`](super_glue.md)

docs/catalog/oxford_flowers102.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,14 @@
22
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
33
<meta itemprop="name" content="TensorFlow Datasets" />
44
</div>
5+
56
<meta itemprop="name" content="oxford_flowers102" />
6-
<meta itemprop="description" content="&#10;The Oxford Flowers 102 dataset is a consistent of 102 flower categories commonly occurring&#10;in the United Kingdom. Each class consists of between 40 and 258 images. The images have&#10;large scale, pose and light variations. In addition, there are categories that have large&#10;variations within the category and several very similar categories.&#10;&#10;The dataset is divided into a training set, a validation set and a test set.&#10;The training set and validation set each consist of 10 images per class (totalling 1030 images each).&#10;The test set consist of the remaining 6129 images (minimum 20 per class).&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load('oxford_flowers102', split='train')&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
7+
<meta itemprop="description" content="&#10;The Oxford Flowers 102 dataset is a consistent of 102 flower categories commonly occurring&#10;in the United Kingdom. Each class consists of between 40 and 258 images. The images have&#10;large scale, pose and light variations. In addition, there are categories that have large&#10;variations within the category and several very similar categories.&#10;&#10;The dataset is divided into a training set, a validation set and a test set.&#10;The training set and validation set each consist of 10 images per class (totalling 1020 images each).&#10;The test set consists of the remaining 6149 images (minimum 20 per class).&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load('oxford_flowers102', split='train')&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
78
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/oxford_flowers102" />
89
<meta itemprop="sameAs" content="https://www.robots.ox.ac.uk/~vgg/data/flowers/102/" />
910
<meta itemprop="citation" content="@InProceedings{Nilsback08,&#10; author = &quot;Nilsback, M-E. and Zisserman, A.&quot;,&#10; title = &quot;Automated Flower Classification over a Large Number of Classes&quot;,&#10; booktitle = &quot;Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing&quot;,&#10; year = &quot;2008&quot;,&#10; month = &quot;Dec&quot;&#10;}&#10;" />
1011
</div>
12+
1113
# `oxford_flowers102`
1214

1315
The Oxford Flowers 102 dataset is a consistent of 102 flower categories commonly
@@ -18,8 +20,8 @@ very similar categories.
1820

1921
The dataset is divided into a training set, a validation set and a test set. The
2022
training set and validation set each consist of 10 images per class (totalling
21-
1030 images each). The test set consist of the remaining 6129 images (minimum 20
22-
per class).
23+
1020 images each). The test set consists of the remaining 6149 images (minimum
24+
20 per class).
2325

2426
* URL:
2527
[https://www.robots.ox.ac.uk/~vgg/data/flowers/102/](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)

docs/catalog/scicite.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
<div itemscope itemtype="http://schema.org/Dataset">
2+
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
3+
<meta itemprop="name" content="TensorFlow Datasets" />
4+
</div>
5+
6+
<meta itemprop="name" content="scicite" />
7+
<meta itemprop="description" content="&#10;This is a dataset for classifying citation intents in academic papers.&#10;The main citation intent label for each Json object is specified with the label&#10;key while the citation context is specified in with a context key. Example:&#10;{&#10; 'string': 'In chacma baboons, male-infant relationships can be linked to both&#10; formation of friendships and paternity success [30,31].'&#10; 'sectionName': 'Introduction',&#10; 'label': 'background',&#10; 'citingPaperId': '7a6b2d4b405439',&#10; 'citedPaperId': '9d1abadc55b5e0',&#10; ...&#10; }&#10;You may obtain the full information about the paper using the provided paper ids&#10;with the Semantic Scholar API (https://api.semanticscholar.org/).&#10;The labels are:&#10;Method, Background, Result&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load('scicite', split='train')&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
8+
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/scicite" />
9+
<meta itemprop="sameAs" content="https://github.com/allenai/scicite" />
10+
<meta itemprop="citation" content="&#10;@InProceedings{Cohan2019Structural,&#10; author={Arman Cohan and Waleed Ammar and Madeleine Van Zuylen and Field Cady},&#10; title={Structural Scaffolds for Citation Intent Classification in Scientific Publications},&#10; booktitle=&quot;NAACL&quot;,&#10; year=&quot;2019&quot;&#10;}&#10;" />
11+
</div>
12+
13+
# `scicite`
14+
15+
This is a dataset for classifying citation intents in academic papers. The main
16+
citation intent label for each Json object is specified with the label key while
17+
the citation context is specified in with a context key. Example: { 'string':
18+
'In chacma baboons, male-infant relationships can be linked to both formation of
19+
friendships and paternity success [30,31].' 'sectionName': 'Introduction',
20+
'label': 'background', 'citingPaperId': '7a6b2d4b405439', 'citedPaperId':
21+
'9d1abadc55b5e0', ... } You may obtain the full information about the paper
22+
using the provided paper ids with the Semantic Scholar API
23+
(https://api.semanticscholar.org/). The labels are: Method, Background, Result
24+
25+
* URL:
26+
[https://github.com/allenai/scicite](https://github.com/allenai/scicite)
27+
* `DatasetBuilder`:
28+
[`tfds.text.scicite.Scicite`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/scicite.py)
29+
* Version: `v1.0.0`
30+
* Versions:
31+
32+
* **`1.0.0`** (default):
33+
34+
* Size: `22.12 MiB`
35+
36+
## Features
37+
38+
```python
39+
FeaturesDict({
40+
'citeEnd': Tensor(shape=(), dtype=tf.int64),
41+
'citeStart': Tensor(shape=(), dtype=tf.int64),
42+
'citedPaperId': Text(shape=(), dtype=tf.string),
43+
'citingPaperId': Text(shape=(), dtype=tf.string),
44+
'excerpt_index': Tensor(shape=(), dtype=tf.int32),
45+
'id': Text(shape=(), dtype=tf.string),
46+
'isKeyCitation': Tensor(shape=(), dtype=tf.bool),
47+
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
48+
'label2': ClassLabel(shape=(), dtype=tf.int64, num_classes=4),
49+
'label2_confidence': Tensor(shape=(), dtype=tf.float32),
50+
'label_confidence': Tensor(shape=(), dtype=tf.float32),
51+
'sectionName': Text(shape=(), dtype=tf.string),
52+
'source': ClassLabel(shape=(), dtype=tf.int64, num_classes=7),
53+
'string': Text(shape=(), dtype=tf.string),
54+
})
55+
```
56+
57+
## Statistics
58+
59+
Split | Examples
60+
:--------- | -------:
61+
ALL | 10,969
62+
TRAIN | 8,194
63+
TEST | 1,859
64+
VALIDATION | 916
65+
66+
## Homepage
67+
68+
* [https://github.com/allenai/scicite](https://github.com/allenai/scicite)
69+
70+
## Supervised keys (for `as_supervised=True`)
71+
72+
`(u'string', u'label')`
73+
74+
## Citation
75+
76+
```
77+
@InProceedings{Cohan2019Structural,
78+
author={Arman Cohan and Waleed Ammar and Madeleine Van Zuylen and Field Cady},
79+
title={Structural Scaffolds for Citation Intent Classification in Scientific Publications},
80+
booktitle="NAACL",
81+
year="2019"
82+
}
83+
```
84+
85+
--------------------------------------------------------------------------------

docs/catalog/wikipedia.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2340,7 +2340,6 @@ Versions:
23402340
* `0.0.3`: None
23412341

23422342
### Statistics
2343-
23442343
None computed
23452344

23462345
### Features
@@ -4618,7 +4617,6 @@ Versions:
46184617
* `0.0.3`: None
46194618

46204619
### Statistics
4621-
46224620
None computed
46234621

46244622
### Features
@@ -6321,7 +6319,6 @@ Versions:
63216319
* `0.0.3`: None
63226320

63236321
### Statistics
6324-
63256322
None computed
63266323

63276324
### Features

0 commit comments

Comments
 (0)