tensorflow
diff --git a/‎docs/catalog/_toc.yaml
Lines changed: 3 additions & 1 deletion b/‎docs/catalog/_toc.yaml
Lines changed: 3 additions & 1 deletion
diff --git a/‎docs/catalog/billsum.md
Lines changed: 6 additions & 6 deletions b/‎docs/catalog/billsum.md
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/catalog/c4.md
Lines changed: 35 additions & 1 deletion b/‎docs/catalog/c4.md
Lines changed: 35 additions & 1 deletion
diff --git a/‎docs/catalog/gigaword.md
Lines changed: 2 additions & 2 deletions b/‎docs/catalog/gigaword.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/catalog/overview.md
Lines changed: 1 addition & 0 deletions b/‎docs/catalog/overview.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/catalog/oxford_flowers102.md
Lines changed: 5 additions & 3 deletions b/‎docs/catalog/oxford_flowers102.md
Lines changed: 5 additions & 3 deletions
diff --git a/‎docs/catalog/scicite.md
Lines changed: 85 additions & 0 deletions b/‎docs/catalog/scicite.md
Lines changed: 85 additions & 0 deletions
diff --git a/‎docs/catalog/wikipedia.md
Lines changed: 0 additions & 3 deletions b/‎docs/catalog/wikipedia.md
Lines changed: 0 additions & 3 deletions
@@ -205,7 +205,7 @@ toc:
   title: Summarization
 - section:
   - path: /datasets/catalog/c4
-    title: c4
+    title: c4 (manual)
   - path: /datasets/catalog/definite_pronoun_resolution
     title: definite_pronoun_resolution
   - path: /datasets/catalog/esnli
@@ -224,6 +224,8 @@ toc:
     title: multi_nli
   - path: /datasets/catalog/multi_nli_mismatch
     title: multi_nli_mismatch
+  - path: /datasets/catalog/scicite
+    title: scicite
   - path: /datasets/catalog/snli
     title: snli
   - path: /datasets/catalog/squad
 
@@ -21,12 +21,12 @@ summary.
     [https://github.com/FiscalNote/BillSum](https://github.com/FiscalNote/BillSum)
 *   `DatasetBuilder`:
     [`tfds.summarization.billsum.Billsum`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/summarization/billsum.py)
-*   Version: `v2.0.0`
+*   Version: `v3.0.0`
 *   Versions:
 
-    *   **`2.0.0`** (default):
+    *   **`3.0.0`** (default):
 
-*   Size: `65.79 MiB`
+*   Size: `64.14 MiB`
 
 ## Features
 ```python
@@ -41,9 +41,9 @@ FeaturesDict({
 
 Split   | Examples
 :------ | -------:
-ALL     | 24,116
-TRAIN   | 19,447
-TEST    | 3,432
+ALL     | 23,455
+TRAIN   | 18,949
+TEST    | 3,269
 CA_TEST | 1,237
 
 ## Homepage
 
@@ -2,13 +2,15 @@
   <div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
     <meta itemprop="name" content="TensorFlow Datasets" />
   </div>
+
   <meta itemprop="name" content="c4" />
   <meta itemprop="description" content="A colossal, cleaned version of Common Crawl's web crawl corpus.&#10;&#10;Based on Common Crawl dataset: &quot;https://commoncrawl.org&quot;&#10;&#10;Due to the overhead of cleaning the dataset, it is recommend you prepare it with&#10;a distributed service like Cloud Dataflow. More info at&#10;https://www.tensorflow.org/datasets/beam_datasets.&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load('c4', split='train')&#10;for ex in ds.take(4):&#10;  print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
   <meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/c4" />
   <meta itemprop="sameAs" content="https://github.com/google-research/text-to-text-transfer-transformer#datasets" />
   <meta itemprop="citation" content="&#10;@article{2019t5,&#10;  author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},&#10;  title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},&#10;  journal = {arXiv e-prints},&#10;  year = {2019},&#10;  archivePrefix = {arXiv},&#10;  eprint = {1910.10683},&#10;}&#10;" />
 </div>
-# `c4`
+
+# `c4` (Manual download)
 
 A colossal, cleaned version of Common Crawl's web crawl corpus.
 
@@ -48,6 +50,14 @@ Versions:
 *   `1.0.0`: None
 *   `1.0.1`: None
 
+WARNING: This dataset requires you to download the source data manually into
+manual_dir (defaults to `~/tensorflow_datasets/manual/c4/`): For the
+WebText-like config, you must manually download 'OpenWebText.zip' (from
+https://mega.nz/#F!EZZD0YwJ!9_PlEQzdMVLaNdKv_ICNVQ) and the Common Crawl WET
+files from August 2018 to July 2019
+(https://commoncrawl.org/the-data/get-started/) and place them in the
+`manual_dir`.
+
 ### Statistics
 None computed
 
@@ -75,6 +85,14 @@ Versions:
 *   `1.0.0`: None
 *   `1.0.1`: None
 
+WARNING: This dataset requires you to download the source data manually into
+manual_dir (defaults to `~/tensorflow_datasets/manual/c4/`): For the
+WebText-like config, you must manually download 'OpenWebText.zip' (from
+https://mega.nz/#F!EZZD0YwJ!9_PlEQzdMVLaNdKv_ICNVQ) and the Common Crawl WET
+files from August 2018 to July 2019
+(https://commoncrawl.org/the-data/get-started/) and place them in the
+`manual_dir`.
+
 ### Statistics
 None computed
 
@@ -104,6 +122,14 @@ Versions:
 *   `1.0.0`: None
 *   `1.0.1`: None
 
+WARNING: This dataset requires you to download the source data manually into
+manual_dir (defaults to `~/tensorflow_datasets/manual/c4/`): For the
+WebText-like config, you must manually download 'OpenWebText.zip' (from
+https://mega.nz/#F!EZZD0YwJ!9_PlEQzdMVLaNdKv_ICNVQ) and the Common Crawl WET
+files from August 2018 to July 2019
+(https://commoncrawl.org/the-data/get-started/) and place them in the
+`manual_dir`.
+
 ### Statistics
 None computed
 
@@ -133,6 +159,14 @@ Versions:
 *   `1.0.0`: None
 *   `1.0.1`: None
 
+WARNING: This dataset requires you to download the source data manually into
+manual_dir (defaults to `~/tensorflow_datasets/manual/c4/`): For the
+WebText-like config, you must manually download 'OpenWebText.zip' (from
+https://mega.nz/#F!EZZD0YwJ!9_PlEQzdMVLaNdKv_ICNVQ) and the Common Crawl WET
+files from August 2018 to July 2019
+(https://commoncrawl.org/the-data/get-started/) and place them in the
+`manual_dir`.
+
 ### Statistics
 None computed
 
 
@@ -21,10 +21,10 @@ There are two features: - document: article. - summary: headline.
     [https://github.com/harvardnlp/sent-summary](https://github.com/harvardnlp/sent-summary)
 *   `DatasetBuilder`:
     [`tfds.summarization.gigaword.Gigaword`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/summarization/gigaword.py)
-*   Version: `v1.1.0`
+*   Version: `v1.2.0`
 *   Versions:
 
-    *   **`1.1.0`** (default):
+    *   **`1.2.0`** (default):
 
 *   Size: `551.61 MiB`
 
 
@@ -146,6 +146,7 @@ np_datasets = tfds.as_numpy(datasets)
     *   [`math_dataset`](math_dataset.md)
     *   [`multi_nli`](multi_nli.md)
     *   [`multi_nli_mismatch`](multi_nli_mismatch.md)
+    *   [`scicite`](scicite.md)
     *   [`snli`](snli.md)
     *   [`squad`](squad.md)
     *   [`super_glue`](super_glue.md)
 
@@ -2,12 +2,14 @@
   <div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
     <meta itemprop="name" content="TensorFlow Datasets" />
   </div>
+
   <meta itemprop="name" content="oxford_flowers102" />
-  <meta itemprop="description" content="&#10;The Oxford Flowers 102 dataset is a consistent of 102 flower categories commonly occurring&#10;in the United Kingdom. Each class consists of between 40 and 258 images. The images have&#10;large scale, pose and light variations. In addition, there are categories that have large&#10;variations within the category and several very similar categories.&#10;&#10;The dataset is divided into a training set, a validation set and a test set.&#10;The training set and validation set each consist of 10 images per class (totalling 1030 images each).&#10;The test set consist of the remaining 6129 images (minimum 20 per class).&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load('oxford_flowers102', split='train')&#10;for ex in ds.take(4):&#10;  print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
+  <meta itemprop="description" content="&#10;The Oxford Flowers 102 dataset is a consistent of 102 flower categories commonly occurring&#10;in the United Kingdom. Each class consists of between 40 and 258 images. The images have&#10;large scale, pose and light variations. In addition, there are categories that have large&#10;variations within the category and several very similar categories.&#10;&#10;The dataset is divided into a training set, a validation set and a test set.&#10;The training set and validation set each consist of 10 images per class (totalling 1020 images each).&#10;The test set consists of the remaining 6149 images (minimum 20 per class).&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load('oxford_flowers102', split='train')&#10;for ex in ds.take(4):&#10;  print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
   <meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/oxford_flowers102" />
   <meta itemprop="sameAs" content="https://www.robots.ox.ac.uk/~vgg/data/flowers/102/" />
   <meta itemprop="citation" content="@InProceedings{Nilsback08,&#10;   author = &quot;Nilsback, M-E. and Zisserman, A.&quot;,&#10;   title = &quot;Automated Flower Classification over a Large Number of Classes&quot;,&#10;   booktitle = &quot;Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing&quot;,&#10;   year = &quot;2008&quot;,&#10;   month = &quot;Dec&quot;&#10;}&#10;" />
 </div>
+
 # `oxford_flowers102`
 
 The Oxford Flowers 102 dataset is a consistent of 102 flower categories commonly
@@ -18,8 +20,8 @@ very similar categories.
 
 The dataset is divided into a training set, a validation set and a test set. The
 training set and validation set each consist of 10 images per class (totalling
-1030 images each). The test set consist of the remaining 6129 images (minimum 20
-per class).
+1020 images each). The test set consists of the remaining 6149 images (minimum
+20 per class).
 
 *   URL:
     [https://www.robots.ox.ac.uk/~vgg/data/flowers/102/](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)
 
@@ -0,0 +1,85 @@
+<div itemscope itemtype="http://schema.org/Dataset">
+  <div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
+    <meta itemprop="name" content="TensorFlow Datasets" />
+  </div>
+
+  <meta itemprop="name" content="scicite" />
+  <meta itemprop="description" content="&#10;This is a dataset for classifying citation intents in academic papers.&#10;The main citation intent label for each Json object is specified with the label&#10;key while the citation context is specified in with a context key. Example:&#10;{&#10; 'string': 'In chacma baboons, male-infant relationships can be linked to both&#10;    formation of friendships and paternity success [30,31].'&#10; 'sectionName': 'Introduction',&#10; 'label': 'background',&#10; 'citingPaperId': '7a6b2d4b405439',&#10; 'citedPaperId': '9d1abadc55b5e0',&#10; ...&#10; }&#10;You may obtain the full information about the paper using the provided paper ids&#10;with the Semantic Scholar API (https://api.semanticscholar.org/).&#10;The labels are:&#10;Method, Background, Result&#10;&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load('scicite', split='train')&#10;for ex in ds.take(4):&#10;  print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
+  <meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/scicite" />
+  <meta itemprop="sameAs" content="https://github.com/allenai/scicite" />
+  <meta itemprop="citation" content="&#10;@InProceedings{Cohan2019Structural,&#10;  author={Arman Cohan and Waleed Ammar and Madeleine Van Zuylen and Field Cady},&#10;  title={Structural Scaffolds for Citation Intent Classification in Scientific Publications},&#10;  booktitle=&quot;NAACL&quot;,&#10;  year=&quot;2019&quot;&#10;}&#10;" />
+</div>
+
+# `scicite`
+
+This is a dataset for classifying citation intents in academic papers. The main
+citation intent label for each Json object is specified with the label key while
+the citation context is specified in with a context key. Example: { 'string':
+'In chacma baboons, male-infant relationships can be linked to both formation of
+friendships and paternity success [30,31].' 'sectionName': 'Introduction',
+'label': 'background', 'citingPaperId': '7a6b2d4b405439', 'citedPaperId':
+'9d1abadc55b5e0', ... } You may obtain the full information about the paper
+using the provided paper ids with the Semantic Scholar API
+(https://api.semanticscholar.org/). The labels are: Method, Background, Result
+
+*   URL:
+    [https://github.com/allenai/scicite](https://github.com/allenai/scicite)
+*   `DatasetBuilder`:
+    [`tfds.text.scicite.Scicite`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/scicite.py)
+*   Version: `v1.0.0`
+*   Versions:
+
+    *   **`1.0.0`** (default):
+
+*   Size: `22.12 MiB`
+
+## Features
+
+```python
+FeaturesDict({
+    'citeEnd': Tensor(shape=(), dtype=tf.int64),
+    'citeStart': Tensor(shape=(), dtype=tf.int64),
+    'citedPaperId': Text(shape=(), dtype=tf.string),
+    'citingPaperId': Text(shape=(), dtype=tf.string),
+    'excerpt_index': Tensor(shape=(), dtype=tf.int32),
+    'id': Text(shape=(), dtype=tf.string),
+    'isKeyCitation': Tensor(shape=(), dtype=tf.bool),
+    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
+    'label2': ClassLabel(shape=(), dtype=tf.int64, num_classes=4),
+    'label2_confidence': Tensor(shape=(), dtype=tf.float32),
+    'label_confidence': Tensor(shape=(), dtype=tf.float32),
+    'sectionName': Text(shape=(), dtype=tf.string),
+    'source': ClassLabel(shape=(), dtype=tf.int64, num_classes=7),
+    'string': Text(shape=(), dtype=tf.string),
+})
+```
+
+## Statistics
+
+Split      | Examples
+:--------- | -------:
+ALL        | 10,969
+TRAIN      | 8,194
+TEST       | 1,859
+VALIDATION | 916
+
+## Homepage
+
+*   [https://github.com/allenai/scicite](https://github.com/allenai/scicite)
+
+## Supervised keys (for `as_supervised=True`)
+
+`(u'string', u'label')`
+
+## Citation
+
+```
+@InProceedings{Cohan2019Structural,
+  author={Arman Cohan and Waleed Ammar and Madeleine Van Zuylen and Field Cady},
+  title={Structural Scaffolds for Citation Intent Classification in Scientific Publications},
+  booktitle="NAACL",
+  year="2019"
+}
+```
+
+--------------------------------------------------------------------------------
@@ -2340,7 +2340,6 @@ Versions:
 *   `0.0.3`: None
 
 ### Statistics
-
 None computed
 
 ### Features
@@ -4618,7 +4617,6 @@ Versions:
 *   `0.0.3`: None
 
 ### Statistics
-
 None computed
 
 ### Features
@@ -6321,7 +6319,6 @@ Versions:
 *   `0.0.3`: None
 
 ### Statistics
-
 None computed
 
 ### Features