Skip to content

Commit 3c69b42

Browse files
Conchylicultorcopybara-github
authored andcommitted
Automated documentation update
PiperOrigin-RevId: 296520398
1 parent 2e015d4 commit 3c69b42

14 files changed

+149
-12
lines changed

docs/catalog/_toc.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ toc:
44
- section:
55
- path: /datasets/catalog/groove
66
title: groove
7+
- path: /datasets/catalog/librispeech
8+
title: librispeech
79
- path: /datasets/catalog/nsynth
810
title: nsynth
911
title: Audio
@@ -242,6 +244,8 @@ toc:
242244
title: glue
243245
- path: /datasets/catalog/imdb_reviews
244246
title: imdb_reviews
247+
- path: /datasets/catalog/librispeech_lm
248+
title: librispeech_lm
245249
- path: /datasets/catalog/lm1b
246250
title: lm1b
247251
- path: /datasets/catalog/math_dataset

docs/catalog/beans.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ and collected by the Makerere AI research lab.
2828
* **Dataset size**: `171.63 MiB`
2929
* **Auto-cached**
3030
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
31-
Yes (test, validation), Only when `shuffle_files=False` (train)
31+
Yes (validation, test), Only when `shuffle_files=False` (train)
3232
* **Splits**:
3333

3434
Split | Examples

docs/catalog/c4.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ https://www.tensorflow.org/datasets/beam_datasets.
4444
`manual_dir`.
4545
* **Auto-cached**
4646
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
47-
Yes
47+
Unknown
4848
* **Splits**:
4949

5050
Split | Examples

docs/catalog/cityscapes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ get the files.
6060
Other configs do require additional files - please see code for more details.
6161
* **Auto-cached**
6262
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
63-
Yes
63+
Unknown
6464
* **Splits**:
6565

6666
Split | Examples

docs/catalog/cnn_dailymail.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ each highlight, which is the target summary
3131
* **Dataset size**: `Unknown size`
3232
* **Auto-cached**
3333
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
34-
Yes
34+
Unknown
3535
* **Splits**:
3636

3737
Split | Examples

docs/catalog/image_label_folder.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Generic image classification dataset.
3131
This is a 'template' dataset.
3232
* **Auto-cached**
3333
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
34-
Yes
34+
Unknown
3535
* **Splits**:
3636

3737
Split | Examples

docs/catalog/librispeech.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
<div itemscope itemtype="http://schema.org/Dataset">
2+
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
3+
<meta itemprop="name" content="TensorFlow Datasets" />
4+
</div>
5+
6+
<meta itemprop="name" content="librispeech" />
7+
<meta itemprop="description" content="LibriSpeech is a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz,&#10;prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read&#10;audiobooks from the LibriVox project, and has been carefully segmented and aligned.87&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;librispeech&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
8+
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/librispeech" />
9+
<meta itemprop="sameAs" content="http://www.openslr.org/12" />
10+
<meta itemprop="citation" content="@inproceedings{panayotov2015librispeech,&#10; title={Librispeech: an ASR corpus based on public domain audio books},&#10; author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},&#10; booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},&#10; pages={5206--5210},&#10; year={2015},&#10; organization={IEEE}&#10;}&#10;" />
11+
</div>
12+
13+
# `librispeech`
14+
15+
* **Description**:
16+
17+
LibriSpeech is a corpus of approximately 1000 hours of read English speech with
18+
sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of
19+
Daniel Povey. The data is derived from read audiobooks from the LibriVox
20+
project, and has been carefully segmented and aligned.87
21+
22+
* **Homepage**: [http://www.openslr.org/12](http://www.openslr.org/12)
23+
* **Source code**:
24+
[`tfds.audio.librispeech.Librispeech`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/audio/librispeech.py)
25+
* **Versions**:
26+
* **`1.1.0`** (default): No release notes.
27+
* **Download size**: `Unknown size`
28+
* **Dataset size**: `Unknown size`
29+
* **Auto-cached**
30+
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
31+
Unknown
32+
* **Splits**:
33+
34+
Split | Examples
35+
:---- | -------:
36+
37+
* **Features**:
38+
39+
```python
40+
FeaturesDict({
41+
'chapter_id': Tensor(shape=(), dtype=tf.int64),
42+
'id': Tensor(shape=(), dtype=tf.string),
43+
'speaker_id': Tensor(shape=(), dtype=tf.int64),
44+
'speech': Audio(shape=(None,), dtype=tf.int64),
45+
'text': Text(shape=(), dtype=tf.string),
46+
})
47+
```
48+
49+
* **Supervised keys** (See
50+
[`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load)):
51+
`('speech', 'text')`
52+
* **Citation**:
53+
54+
```
55+
@inproceedings{panayotov2015librispeech,
56+
title={Librispeech: an ASR corpus based on public domain audio books},
57+
author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
58+
booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},
59+
pages={5206--5210},
60+
year={2015},
61+
organization={IEEE}
62+
}
63+
```
64+
65+
## librispeech/plain_text (default config)
66+
67+
* **Config description**: Transcriptions are in plain text.
68+
69+
## librispeech/subwords8k
70+
71+
* **Config description**: Transcriptions use the SubwordTextEncoder
72+
73+
## librispeech/subwords32k
74+
75+
* **Config description**: Transcriptions use the SubwordTextEncoder

docs/catalog/librispeech_lm.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
<div itemscope itemtype="http://schema.org/Dataset">
2+
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
3+
<meta itemprop="name" content="TensorFlow Datasets" />
4+
</div>
5+
6+
<meta itemprop="name" content="librispeech_lm" />
7+
<meta itemprop="description" content="Language modeling resources to be used in conjunction with the LibriSpeech ASR corpus.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;librispeech_lm&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
8+
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/librispeech_lm" />
9+
<meta itemprop="sameAs" content="http://www.openslr.org/11" />
10+
<meta itemprop="citation" content="@inproceedings{panayotov2015librispeech,&#10; title={Librispeech: an ASR corpus based on public domain audio books},&#10; author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},&#10; booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},&#10; pages={5206--5210},&#10; year={2015},&#10; organization={IEEE}&#10;}&#10;" />
11+
</div>
12+
13+
# `librispeech_lm`
14+
15+
* **Description**:
16+
17+
Language modeling resources to be used in conjunction with the LibriSpeech ASR
18+
corpus.
19+
20+
* **Homepage**: [http://www.openslr.org/11](http://www.openslr.org/11)
21+
* **Source code**:
22+
[`tfds.text.librispeech_lm.LibrispeechLm`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/librispeech_lm.py)
23+
* **Versions**:
24+
* **`0.1.0`** (default): No release notes.
25+
* **Download size**: `Unknown size`
26+
* **Dataset size**: `Unknown size`
27+
* **Auto-cached**
28+
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
29+
Unknown
30+
* **Splits**:
31+
32+
Split | Examples
33+
:---- | -------:
34+
35+
* **Features**:
36+
37+
```python
38+
FeaturesDict({
39+
'text': Text(shape=(), dtype=tf.string),
40+
})
41+
```
42+
43+
* **Supervised keys** (See
44+
[`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load)):
45+
`('text', 'text')`
46+
* **Citation**:
47+
48+
```
49+
@inproceedings{panayotov2015librispeech,
50+
title={Librispeech: an ASR corpus based on public domain audio books},
51+
author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
52+
booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},
53+
pages={5206--5210},
54+
year={2015},
55+
organization={IEEE}
56+
}
57+
```

docs/catalog/opinosis.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,12 @@
22
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
33
<meta itemprop="name" content="TensorFlow Datasets" />
44
</div>
5-
65
<meta itemprop="name" content="opinosis" />
76
<meta itemprop="description" content="&#10;The Opinosis Opinion Dataset consists of sentences extracted from reviews for 51 topics.&#10;Topics and opinions are obtained from Tripadvisor, Edmunds.com and Amazon.com.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;opinosis&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
87
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/opinosis" />
98
<meta itemprop="sameAs" content="http://kavita-ganesan.com/opinosis/" />
109
<meta itemprop="citation" content="&#10;@inproceedings{ganesan2010opinosis,&#10; title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},&#10; author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},&#10; booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},&#10; pages={340--348},&#10; year={2010},&#10; organization={Association for Computational Linguistics}&#10;}&#10;" />
1110
</div>
12-
1311
# `opinosis`
1412

1513
* **Description**:
@@ -43,7 +41,6 @@ FeaturesDict({
4341
'summaries': Sequence(Text(shape=(), dtype=tf.string)),
4442
})
4543
```
46-
4744
* **Supervised keys** (See
4845
[`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load)):
4946
`('review_sents', 'summaries')`

docs/catalog/overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ np_datasets = tfds.as_numpy(datasets)
3636

3737
* `Audio`
3838
* [`groove`](groove.md)
39+
* [`librispeech`](librispeech.md)
3940
* [`nsynth`](nsynth.md)
4041
* `Image`
4142
* [`abstract_reasoning`](abstract_reasoning.md)
@@ -155,6 +156,7 @@ np_datasets = tfds.as_numpy(datasets)
155156
* [`gap`](gap.md)
156157
* [`glue`](glue.md)
157158
* [`imdb_reviews`](imdb_reviews.md)
159+
* [`librispeech_lm`](librispeech_lm.md)
158160
* [`lm1b`](lm1b.md)
159161
* [`math_dataset`](math_dataset.md)
160162
* [`movie_rationales`](movie_rationales.md)

docs/catalog/qa4mre.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,12 @@
22
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
33
<meta itemprop="name" content="TensorFlow Datasets" />
44
</div>
5-
65
<meta itemprop="name" content="qa4mre" />
76
<meta itemprop="description" content="&#10;QA4MRE dataset was created for the CLEF 2011/2012/2013 shared tasks to promote research in &#10;question answering and reading comprehension. The dataset contains a supporting &#10;passage and a set of questions corresponding to the passage. Multiple options &#10;for answers are provided for each question, of which only one is correct. The &#10;training and test datasets are available for the main track.&#10;Additional gold standard documents are available for two pilot studies: one on &#10;alzheimers data, and the other on entrance exams data.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;qa4mre&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
87
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/qa4mre" />
98
<meta itemprop="sameAs" content="http://nlp.uned.es/clef-qa/repository/pastCampaigns.php" />
109
<meta itemprop="citation" content="&#10;@InProceedings{10.1007/978-3-642-40802-1_29,&#10;author=&quot;Pe{\~{n}}as, Anselmo&#10;and Hovy, Eduard&#10;and Forner, Pamela&#10;and Rodrigo, {\&#x27;A}lvaro&#10;and Sutcliffe, Richard&#10;and Morante, Roser&quot;,&#10;editor=&quot;Forner, Pamela&#10;and M{\&quot;u}ller, Henning&#10;and Paredes, Roberto&#10;and Rosso, Paolo&#10;and Stein, Benno&quot;,&#10;title=&quot;QA4MRE 2011-2013: Overview of Question Answering for Machine Reading Evaluation&quot;,&#10;booktitle=&quot;Information Access Evaluation. Multilinguality, Multimodality, and Visualization&quot;,&#10;year=&quot;2013&quot;,&#10;publisher=&quot;Springer Berlin Heidelberg&quot;,&#10;address=&quot;Berlin, Heidelberg&quot;,&#10;pages=&quot;303--320&quot;,&#10;abstract=&quot;This paper describes the methodology for testing the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. This was the attempt of the QA4MRE challenge which was run as a Lab at CLEF 2011--2013. The traditional QA task was replaced by a new Machine Reading task, whose intention was to ask questions that required a deep knowledge of individual short texts and in which systems were required to choose one answer, by analysing the corresponding test document in conjunction with background text collections provided by the organization. Four different tasks have been organized during these years: Main Task, Processing Modality and Negation for Machine Reading, Machine Reading of Biomedical Texts about Alzheimer&#x27;s disease, and Entrance Exams. This paper describes their motivation, their goals, their methodology for preparing the data sets, their background collections, their metrics used for the evaluation, and the lessons learned along these three years.&quot;,&#10;isbn=&quot;978-3-642-40802-1&quot;&#10;}&#10;" />
1110
</div>
12-
1311
# `qa4mre`
1412

1513
* **Description**:

docs/catalog/so2sat.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ http://creativecommons.org/licenses/by/4.0
3737
* **Dataset size**: `Unknown size`
3838
* **Auto-cached**
3939
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
40-
Yes
40+
Unknown
4141
* **Splits**:
4242

4343
Split | Examples

docs/catalog/wmt18_translate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ builder = tfds.builder("wmt_translate", config=config)
4949
be downloaded.
5050
* **Auto-cached**
5151
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
52-
Yes
52+
Unknown
5353
* **Splits**:
5454

5555
Split | Examples

tensorflow_datasets/testing/metadata/missing.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ cnn_dailymail/subwords32k/3.0.0
1616
diabetic_retinopathy_detection/btgraham-300/1.0.0
1717
glue/ax/0.0.2
1818
image_label_folder/2.0.0
19+
librispeech/plain_text/1.1.0
20+
librispeech/subwords32k/1.1.0
21+
librispeech/subwords8k/1.1.0
22+
librispeech_lm/0.1.0
1923
oxford_iiit_pet/1.2.0
2024
so2sat/all/0.0.1
2125
so2sat/all/2.0.0

0 commit comments

Comments
 (0)