Skip to content

Commit 3c69b42

Browse files
Conchylicultorcopybara-github
authored andcommitted
Automated documentation update
PiperOrigin-RevId: 296520398
1 parent 2e015d4 commit 3c69b42

File tree

14 files changed

+149
-12
lines changed

14 files changed

+149
-12
lines changed

docs/catalog/_toc.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ toc:
44
- section:
55
- path: /datasets/catalog/groove
66
title: groove
7+
- path: /datasets/catalog/librispeech
8+
title: librispeech
79
- path: /datasets/catalog/nsynth
810
title: nsynth
911
title: Audio
@@ -242,6 +244,8 @@ toc:
242244
title: glue
243245
- path: /datasets/catalog/imdb_reviews
244246
title: imdb_reviews
247+
- path: /datasets/catalog/librispeech_lm
248+
title: librispeech_lm
245249
- path: /datasets/catalog/lm1b
246250
title: lm1b
247251
- path: /datasets/catalog/math_dataset

docs/catalog/beans.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ and collected by the Makerere AI research lab.
2828
* **Dataset size**: `171.63 MiB`
2929
* **Auto-cached**
3030
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
31-
Yes (test, validation), Only when `shuffle_files=False` (train)
31+
Yes (validation, test), Only when `shuffle_files=False` (train)
3232
* **Splits**:
3333

3434
Split | Examples

docs/catalog/c4.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ https://www.tensorflow.org/datasets/beam_datasets.
4444
`manual_dir`.
4545
* **Auto-cached**
4646
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
47-
Yes
47+
Unknown
4848
* **Splits**:
4949

5050
Split | Examples

docs/catalog/cityscapes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ get the files.
6060
Other configs do require additional files - please see code for more details.
6161
* **Auto-cached**
6262
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
63-
Yes
63+
Unknown
6464
* **Splits**:
6565

6666
Split | Examples

docs/catalog/cnn_dailymail.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ each highlight, which is the target summary
3131
* **Dataset size**: `Unknown size`
3232
* **Auto-cached**
3333
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
34-
Yes
34+
Unknown
3535
* **Splits**:
3636

3737
Split | Examples

docs/catalog/image_label_folder.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Generic image classification dataset.
3131
This is a 'template' dataset.
3232
* **Auto-cached**
3333
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
34-
Yes
34+
Unknown
3535
* **Splits**:
3636

3737
Split | Examples

docs/catalog/librispeech.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
<div itemscope itemtype="http://schema.org/Dataset">
2+
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
3+
<meta itemprop="name" content="TensorFlow Datasets" />
4+
</div>
5+
6+
<meta itemprop="name" content="librispeech" />
7+
<meta itemprop="description" content="LibriSpeech is a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz,&#10;prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read&#10;audiobooks from the LibriVox project, and has been carefully segmented and aligned.87&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;librispeech&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
8+
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/librispeech" />
9+
<meta itemprop="sameAs" content="http://www.openslr.org/12" />
10+
<meta itemprop="citation" content="@inproceedings{panayotov2015librispeech,&#10; title={Librispeech: an ASR corpus based on public domain audio books},&#10; author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},&#10; booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},&#10; pages={5206--5210},&#10; year={2015},&#10; organization={IEEE}&#10;}&#10;" />
11+
</div>
12+
13+
# `librispeech`
14+
15+
* **Description**:
16+
17+
LibriSpeech is a corpus of approximately 1000 hours of read English speech with
18+
sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of
19+
Daniel Povey. The data is derived from read audiobooks from the LibriVox
20+
project, and has been carefully segmented and aligned.87
21+
22+
* **Homepage**: [http://www.openslr.org/12](http://www.openslr.org/12)
23+
* **Source code**:
24+
[`tfds.audio.librispeech.Librispeech`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/audio/librispeech.py)
25+
* **Versions**:
26+
* **`1.1.0`** (default): No release notes.
27+
* **Download size**: `Unknown size`
28+
* **Dataset size**: `Unknown size`
29+
* **Auto-cached**
30+
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
31+
Unknown
32+
* **Splits**:
33+
34+
Split | Examples
35+
:---- | -------:
36+
37+
* **Features**:
38+
39+
```python
40+
FeaturesDict({
41+
'chapter_id': Tensor(shape=(), dtype=tf.int64),
42+
'id': Tensor(shape=(), dtype=tf.string),
43+
'speaker_id': Tensor(shape=(), dtype=tf.int64),
44+
'speech': Audio(shape=(None,), dtype=tf.int64),
45+
'text': Text(shape=(), dtype=tf.string),
46+
})
47+
```
48+
49+
* **Supervised keys** (See
50+
[`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load)):
51+
`('speech', 'text')`
52+
* **Citation**:
53+
54+
```
55+
@inproceedings{panayotov2015librispeech,
56+
title={Librispeech: an ASR corpus based on public domain audio books},
57+
author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
58+
booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},
59+
pages={5206--5210},
60+
year={2015},
61+
organization={IEEE}
62+
}
63+
```
64+
65+
## librispeech/plain_text (default config)
66+
67+
* **Config description**: Transcriptions are in plain text.
68+
69+
## librispeech/subwords8k
70+
71+
* **Config description**: Transcriptions use the SubwordTextEncoder
72+
73+
## librispeech/subwords32k
74+
75+
* **Config description**: Transcriptions use the SubwordTextEncoder

docs/catalog/librispeech_lm.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
<div itemscope itemtype="http://schema.org/Dataset">
2+
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
3+
<meta itemprop="name" content="TensorFlow Datasets" />
4+
</div>
5+
6+
<meta itemprop="name" content="librispeech_lm" />
7+
<meta itemprop="description" content="Language modeling resources to be used in conjunction with the LibriSpeech ASR corpus.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;librispeech_lm&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
8+
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/librispeech_lm" />
9+
<meta itemprop="sameAs" content="http://www.openslr.org/11" />
10+
<meta itemprop="citation" content="@inproceedings{panayotov2015librispeech,&#10; title={Librispeech: an ASR corpus based on public domain audio books},&#10; author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},&#10; booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},&#10; pages={5206--5210},&#10; year={2015},&#10; organization={IEEE}&#10;}&#10;" />
11+
</div>
12+
13+
# `librispeech_lm`
14+
15+
* **Description**:
16+
17+
Language modeling resources to be used in conjunction with the LibriSpeech ASR
18+
corpus.
19+
20+
* **Homepage**: [http://www.openslr.org/11](http://www.openslr.org/11)
21+
* **Source code**:
22+
[`tfds.text.librispeech_lm.LibrispeechLm`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/librispeech_lm.py)
23+
* **Versions**:
24+
* **`0.1.0`** (default): No release notes.
25+
* **Download size**: `Unknown size`
26+
* **Dataset size**: `Unknown size`
27+
* **Auto-cached**
28+
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
29+
Unknown
30+
* **Splits**:
31+
32+
Split | Examples
33+
:---- | -------:
34+
35+
* **Features**:
36+
37+
```python
38+
FeaturesDict({
39+
'text': Text(shape=(), dtype=tf.string),
40+
})
41+
```
42+
43+
* **Supervised keys** (See
44+
[`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load)):
45+
`('text', 'text')`
46+
* **Citation**:
47+
48+
```
49+
@inproceedings{panayotov2015librispeech,
50+
title={Librispeech: an ASR corpus based on public domain audio books},
51+
author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
52+
booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},
53+
pages={5206--5210},
54+
year={2015},
55+
organization={IEEE}
56+
}
57+
```

docs/catalog/opinosis.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,12 @@
22
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
33
<meta itemprop="name" content="TensorFlow Datasets" />
44
</div>
5-
65
<meta itemprop="name" content="opinosis" />
76
<meta itemprop="description" content="&#10;The Opinosis Opinion Dataset consists of sentences extracted from reviews for 51 topics.&#10;Topics and opinions are obtained from Tripadvisor, Edmunds.com and Amazon.com.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;opinosis&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
87
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/opinosis" />
98
<meta itemprop="sameAs" content="http://kavita-ganesan.com/opinosis/" />
109
<meta itemprop="citation" content="&#10;@inproceedings{ganesan2010opinosis,&#10; title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},&#10; author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},&#10; booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},&#10; pages={340--348},&#10; year={2010},&#10; organization={Association for Computational Linguistics}&#10;}&#10;" />
1110
</div>
12-
1311
# `opinosis`
1412

1513
* **Description**:
@@ -43,7 +41,6 @@ FeaturesDict({
4341
'summaries': Sequence(Text(shape=(), dtype=tf.string)),
4442
})
4543
```
46-
4744
* **Supervised keys** (See
4845
[`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load)):
4946
`('review_sents', 'summaries')`

docs/catalog/overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ np_datasets = tfds.as_numpy(datasets)
3636

3737
* `Audio`
3838
* [`groove`](groove.md)
39+
* [`librispeech`](librispeech.md)
3940
* [`nsynth`](nsynth.md)
4041
* `Image`
4142
* [`abstract_reasoning`](abstract_reasoning.md)
@@ -155,6 +156,7 @@ np_datasets = tfds.as_numpy(datasets)
155156
* [`gap`](gap.md)
156157
* [`glue`](glue.md)
157158
* [`imdb_reviews`](imdb_reviews.md)
159+
* [`librispeech_lm`](librispeech_lm.md)
158160
* [`lm1b`](lm1b.md)
159161
* [`math_dataset`](math_dataset.md)
160162
* [`movie_rationales`](movie_rationales.md)

0 commit comments

Comments
 (0)