Skip to content

Commit f679fc1

Browse files
Conchylicultorcopybara-github
authored andcommitted
Automated documentation update
PiperOrigin-RevId: 296311032
1 parent ff4e90c commit f679fc1

File tree

9 files changed

+152
-31
lines changed

9 files changed

+152
-31
lines changed

docs/catalog/_toc.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,8 @@ toc:
210210
title: multi_news
211211
- path: /datasets/catalog/newsroom
212212
title: newsroom (manual)
213+
- path: /datasets/catalog/opinosis
214+
title: opinosis
213215
- path: /datasets/catalog/reddit_tifu
214216
title: reddit_tifu
215217
- path: /datasets/catalog/scientific_papers

docs/catalog/beans.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ and collected by the Makerere AI research lab.
2828
* **Dataset size**: `171.63 MiB`
2929
* **Auto-cached**
3030
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
31-
Yes (validation, test), Only when `shuffle_files=False` (train)
31+
Yes (test, validation), Only when `shuffle_files=False` (train)
3232
* **Splits**:
3333

3434
Split | Examples

docs/catalog/c4.md

Lines changed: 9 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,12 @@ https://www.tensorflow.org/datasets/beam_datasets.
2727
* **Source code**:
2828
[`tfds.text.c4.C4`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/c4.py)
2929
* **Versions**:
30-
* **`2.2.0`** (default): No release notes.
30+
* **`2.2.1`** (default): Update dataset_info.json
31+
* `2.2.0`: No release notes.
3132
* `1.1.0`: No release notes.
3233
* `1.0.1`: No release notes.
3334
* `1.0.0`: No release notes.
34-
* **Download size**: `6.96 TiB`
35+
* **Download size**: `Unknown size`
3536
* **Dataset size**: `Unknown size`
3637
* **Manual download instructions**: This dataset requires you to download the
3738
source data manually into `download_config.manual_dir`
@@ -43,7 +44,12 @@ https://www.tensorflow.org/datasets/beam_datasets.
4344
`manual_dir`.
4445
* **Auto-cached**
4546
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
46-
No
47+
Yes
48+
* **Splits**:
49+
50+
Split | Examples
51+
:---- | -------:
52+
4753
* **Features**:
4854

4955
```python
@@ -74,44 +80,20 @@ FeaturesDict({
7480
## c4/en (default config)
7581

7682
* **Config description**: English C4 dataset.
77-
* **Splits**:
78-
79-
Split | Examples
80-
:----------- | ----------:
81-
'train' | 364,684,602
82-
'validation' | 364,525
8383

8484
## c4/en.noclean
8585

8686
* **Config description**: Disables all cleaning (deduplication, removal based
8787
on bad words, etc.)
88-
* **Splits**:
89-
90-
Split | Examples
91-
:----------- | ------------:
92-
'train' | 1,063,805,630
93-
'validation' | 1,065,028
9488

9589
## c4/en.realnewslike
9690

9791
* **Config description**: Filters from the default config to only include
9892
content from the domains used in the 'RealNews' dataset (Zellers et al.,
9993
2019).
100-
* **Splits**:
101-
102-
Split | Examples
103-
:----------- | ---------:
104-
'train' | 13,659,362
105-
'validation' | 13,727
10694

10795
## c4/en.webtextlike
10896

10997
* **Config description**: Filters from the default config to only include
11098
content from the URLs in OpenWebText
11199
(https://github.com/jcpeterson/openwebtext).
112-
* **Splits**:
113-
114-
Split | Examples
115-
:----------- | --------:
116-
'train' | 4,441,108
117-
'validation' | 4,417

docs/catalog/opinosis.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
<div itemscope itemtype="http://schema.org/Dataset">
2+
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
3+
<meta itemprop="name" content="TensorFlow Datasets" />
4+
</div>
5+
6+
<meta itemprop="name" content="opinosis" />
7+
<meta itemprop="description" content="&#10;The Opinosis Opinion Dataset consists of sentences extracted from reviews for 51 topics.&#10;Topics and opinions are obtained from Tripadvisor, Edmunds.com and Amazon.com.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;opinosis&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
8+
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/opinosis" />
9+
<meta itemprop="sameAs" content="http://kavita-ganesan.com/opinosis/" />
10+
<meta itemprop="citation" content="&#10;@inproceedings{ganesan2010opinosis,&#10; title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},&#10; author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},&#10; booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},&#10; pages={340--348},&#10; year={2010},&#10; organization={Association for Computational Linguistics}&#10;}&#10;" />
11+
</div>
12+
13+
# `opinosis`
14+
15+
* **Description**:
16+
17+
The Opinosis Opinion Dataset consists of sentences extracted from reviews for 51
18+
topics. Topics and opinions are obtained from Tripadvisor, Edmunds.com and
19+
Amazon.com.
20+
21+
* **Homepage**:
22+
[http://kavita-ganesan.com/opinosis/](http://kavita-ganesan.com/opinosis/)
23+
* **Source code**:
24+
[`tfds.summarization.opinosis.Opinosis`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/summarization/opinosis.py)
25+
* **Versions**:
26+
* **`1.0.0`** (default): No release notes.
27+
* **Download size**: `739.65 KiB`
28+
* **Dataset size**: `725.45 KiB`
29+
* **Auto-cached**
30+
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
31+
Yes
32+
* **Splits**:
33+
34+
Split | Examples
35+
:------ | -------:
36+
'train' | 51
37+
38+
* **Features**:
39+
40+
```python
41+
FeaturesDict({
42+
'review_sents': Text(shape=(), dtype=tf.string),
43+
'summaries': Sequence(Text(shape=(), dtype=tf.string)),
44+
})
45+
```
46+
47+
* **Supervised keys** (See
48+
[`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load)):
49+
`('review_sents', 'summaries')`
50+
* **Citation**:
51+
52+
```
53+
@inproceedings{ganesan2010opinosis,
54+
title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},
55+
author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},
56+
booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},
57+
pages={340--348},
58+
year={2010},
59+
organization={Association for Computational Linguistics}
60+
}
61+
```

docs/catalog/overview.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ np_datasets = tfds.as_numpy(datasets)
139139
* [`gigaword`](gigaword.md)
140140
* [`multi_news`](multi_news.md)
141141
* [`newsroom`](newsroom.md)
142+
* [`opinosis`](opinosis.md)
142143
* [`reddit_tifu`](reddit_tifu.md)
143144
* [`scientific_papers`](scientific_papers.md)
144145
* [`wikihow`](wikihow.md)

docs/catalog/qa4mre.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,14 @@
22
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
33
<meta itemprop="name" content="TensorFlow Datasets" />
44
</div>
5+
56
<meta itemprop="name" content="qa4mre" />
67
<meta itemprop="description" content="&#10;QA4MRE dataset was created for the CLEF 2011/2012/2013 shared tasks to promote research in &#10;question answering and reading comprehension. The dataset contains a supporting &#10;passage and a set of questions corresponding to the passage. Multiple options &#10;for answers are provided for each question, of which only one is correct. The &#10;training and test datasets are available for the main track.&#10;Additional gold standard documents are available for two pilot studies: one on &#10;alzheimers data, and the other on entrance exams data.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;qa4mre&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
78
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/qa4mre" />
89
<meta itemprop="sameAs" content="http://nlp.uned.es/clef-qa/repository/pastCampaigns.php" />
9-
<meta itemprop="citation" content="&#10;@InProceedings{10.1007/978-3-642-40802-1_29,&#10;author=&quot;Pe{\~{n}}as, Anselmo&#10;and Hovy, Eduard&#10;and Forner, Pamela&#10;and Rodrigo, {&#x27;A}lvaro&#10;and Sutcliffe, Richard&#10;and Morante, Roser&quot;,&#10;editor=&quot;Forner, Pamela&#10;and M{&quot;u}ller, Henning&#10;and Paredes, Roberto&#10;and Rosso, Paolo&#10;and Stein, Benno&quot;,&#10;title=&quot;QA4MRE 2011-2013: Overview of Question Answering for Machine Reading Evaluation&quot;,&#10;booktitle=&quot;Information Access Evaluation. Multilinguality, Multimodality, and Visualization&quot;,&#10;year=&quot;2013&quot;,&#10;publisher=&quot;Springer Berlin Heidelberg&quot;,&#10;address=&quot;Berlin, Heidelberg&quot;,&#10;pages=&quot;303--320&quot;,&#10;abstract=&quot;This paper describes the methodology for testing the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. This was the attempt of the QA4MRE challenge which was run as a Lab at CLEF 2011--2013. The traditional QA task was replaced by a new Machine Reading task, whose intention was to ask questions that required a deep knowledge of individual short texts and in which systems were required to choose one answer, by analysing the corresponding test document in conjunction with background text collections provided by the organization. Four different tasks have been organized during these years: Main Task, Processing Modality and Negation for Machine Reading, Machine Reading of Biomedical Texts about Alzheimer&#x27;s disease, and Entrance Exams. This paper describes their motivation, their goals, their methodology for preparing the data sets, their background collections, their metrics used for the evaluation, and the lessons learned along these three years.&quot;,&#10;isbn=&quot;978-3-642-40802-1&quot;&#10;}&#10;" />
10+
<meta itemprop="citation" content="&#10;@InProceedings{10.1007/978-3-642-40802-1_29,&#10;author=&quot;Pe{\~{n}}as, Anselmo&#10;and Hovy, Eduard&#10;and Forner, Pamela&#10;and Rodrigo, {\&#x27;A}lvaro&#10;and Sutcliffe, Richard&#10;and Morante, Roser&quot;,&#10;editor=&quot;Forner, Pamela&#10;and M{\&quot;u}ller, Henning&#10;and Paredes, Roberto&#10;and Rosso, Paolo&#10;and Stein, Benno&quot;,&#10;title=&quot;QA4MRE 2011-2013: Overview of Question Answering for Machine Reading Evaluation&quot;,&#10;booktitle=&quot;Information Access Evaluation. Multilinguality, Multimodality, and Visualization&quot;,&#10;year=&quot;2013&quot;,&#10;publisher=&quot;Springer Berlin Heidelberg&quot;,&#10;address=&quot;Berlin, Heidelberg&quot;,&#10;pages=&quot;303--320&quot;,&#10;abstract=&quot;This paper describes the methodology for testing the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. This was the attempt of the QA4MRE challenge which was run as a Lab at CLEF 2011--2013. The traditional QA task was replaced by a new Machine Reading task, whose intention was to ask questions that required a deep knowledge of individual short texts and in which systems were required to choose one answer, by analysing the corresponding test document in conjunction with background text collections provided by the organization. Four different tasks have been organized during these years: Main Task, Processing Modality and Negation for Machine Reading, Machine Reading of Biomedical Texts about Alzheimer&#x27;s disease, and Entrance Exams. This paper describes their motivation, their goals, their methodology for preparing the data sets, their background collections, their metrics used for the evaluation, and the lessons learned along these three years.&quot;,&#10;isbn=&quot;978-3-642-40802-1&quot;&#10;}&#10;" />
1011
</div>
12+
1113
# `qa4mre`
1214

1315
* **Description**:
@@ -60,11 +62,11 @@ FeaturesDict({
6062
author="Pe{\~{n}}as, Anselmo
6163
and Hovy, Eduard
6264
and Forner, Pamela
63-
and Rodrigo, {'A}lvaro
65+
and Rodrigo, {\'A}lvaro
6466
and Sutcliffe, Richard
6567
and Morante, Roser",
6668
editor="Forner, Pamela
67-
and M{"u}ller, Henning
69+
and M{\"u}ller, Henning
6870
and Paredes, Roberto
6971
and Rosso, Paolo
7072
and Stein, Benno",

tensorflow_datasets/testing/metadata/missing.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22
# This is used for reference and debugging.
33
bigearthnet/all/0.0.2
44
bigearthnet/rgb/0.0.2
5+
c4/en.noclean/2.2.1
6+
c4/en.realnewslike/2.2.1
7+
c4/en.webtextlike/2.2.1
8+
c4/en/2.2.1
59
cityscapes/semantic_segmentation/1.0.0
610
cityscapes/semantic_segmentation_extra/1.0.0
711
cityscapes/stereo_disparity/1.0.0
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
{
2+
"citation": "\n@inproceedings{ganesan2010opinosis,\n title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},\n author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},\n booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},\n pages={340--348},\n year={2010},\n organization={Association for Computational Linguistics}\n}\n",
3+
"description": "\nThe Opinosis Opinion Dataset consists of sentences extracted from reviews for 51 topics.\nTopics and opinions are obtained from Tripadvisor, Edmunds.com and Amazon.com.\n",
4+
"downloadSize": "757398",
5+
"location": {
6+
"urls": [
7+
"http://kavita-ganesan.com/opinosis/"
8+
]
9+
},
10+
"name": "opinosis",
11+
"schema": {
12+
"feature": [
13+
{
14+
"name": "review_sents",
15+
"type": "BYTES"
16+
},
17+
{
18+
"name": "summaries",
19+
"shape": {
20+
"dim": [
21+
{
22+
"size": "-1"
23+
}
24+
]
25+
},
26+
"type": "BYTES"
27+
}
28+
]
29+
},
30+
"splits": [
31+
{
32+
"name": "train",
33+
"numBytes": "742862",
34+
"numShards": "1",
35+
"shardLengths": [
36+
"51"
37+
],
38+
"statistics": {
39+
"features": [
40+
{
41+
"bytesStats": {
42+
"commonStats": {
43+
"numNonMissing": "51"
44+
}
45+
},
46+
"name": "review_sents",
47+
"type": "BYTES"
48+
},
49+
{
50+
"bytesStats": {
51+
"commonStats": {
52+
"numNonMissing": "51"
53+
}
54+
},
55+
"name": "summaries",
56+
"type": "BYTES"
57+
}
58+
],
59+
"numExamples": "51"
60+
}
61+
}
62+
],
63+
"supervisedKeys": {
64+
"input": "review_sents",
65+
"output": "summaries"
66+
},
67+
"version": "1.0.0"
68+
}

tensorflow_datasets/testing/metadata/supported.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -761,6 +761,7 @@ open_images_v4/300k/0.2.1
761761
open_images_v4/300k/2.0.0
762762
open_images_v4/original/0.2.0
763763
open_images_v4/original/2.0.0
764+
opinosis/1.0.0
764765
oxford_flowers102/0.0.1
765766
oxford_flowers102/2.0.0
766767
oxford_iiit_pet/3.1.0

0 commit comments

Comments
 (0)