Skip to content

Commit 903bfd4

Browse files
authored
Update splits.md tfds Splits guide
Minor improvements: define two params as per `tfds.core.NamedSplit` API docs for better UX (namely, `k` for even subsplits and `weighted` for proportional subsplits), change "TFDS" to "TensorFlow Datasets" for consistency, split one complex sentence into two simpler ones, change "Ds" to `ds` in a comment
1 parent 816a2b8 commit 903bfd4

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

docs/splits.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Splits
22

33
All `DatasetBuilder`s expose various data subsets defined as
4-
[`tfds.Split`s](api_docs/python/tfds/Split.md)
4+
[`tfds.Split`](api_docs/python/tfds/Split.md)s
55
(typically `tfds.Split.TRAIN` and `tfds.Split.TEST`). A given dataset's
66
splits are defined in
77
[`tfds.DatasetBuilder.info.splits`](api_docs/python/tfds/core/DatasetBuilder.md#info)
@@ -27,7 +27,7 @@ Note that a special `tfds.Split.ALL` keyword exists to merge all splits
2727
together:
2828

2929
```py
30-
# Ds will iterate over test, train and validation merged together
30+
# `ds` will iterate over test, train and validation merged together
3131
ds = tfds.load("mnist", split=tfds.Split.ALL)
3232
```
3333

@@ -36,8 +36,8 @@ ds = tfds.load("mnist", split=tfds.Split.ALL)
3636
You have 3 options for how to get a thinner slice of the data than the
3737
base splits, all based on `tfds.Split.subsplit`.
3838

39-
*Warning*: TFDS does not currently guarantee the order of the data on disk when
40-
data is generated, so if you regenerate the data, the subsplits may no longer be
39+
*Warning*: TensorFlow Datasets does not currently guarantee the order of the data on disk when
40+
data is generated. Therefore, if you regenerate the data, the subsplits may no longer be
4141
the same.
4242

4343
*Warning*: If the `total_number_examples % 100 != 0`, then remainder examples
@@ -46,7 +46,7 @@ may not be evenly distributed among subsplits.
4646
### Specify number of subsplits
4747

4848
```py
49-
train_half_1, train_half_2 = tfds.Split.TRAIN.subsplit(2)
49+
train_half_1, train_half_2 = tfds.Split.TRAIN.subsplit(k=2)
5050

5151
dataset = tfds.load("mnist", split=train_half_1)
5252
```
@@ -64,7 +64,7 @@ dataset = tfds.load("mnist", split=middle_50_percent)
6464
### Specifying weights
6565

6666
```py
67-
half, quarter1, quarter2 = tfds.Split.TRAIN.subsplit([2, 1, 1])
67+
half, quarter1, quarter2 = tfds.Split.TRAIN.subsplit(weighted=[2, 1, 1])
6868

6969
dataset = tfds.load("mnist", split=half)
7070
```
@@ -78,7 +78,7 @@ It's possible to compose the above operations:
7878
split = tfds.Split.TRAIN.subsplit(tfds.percent[:50]) + tfds.Split.TEST
7979

8080
# Split the combined TRAIN and TEST splits into 2
81-
first_half, second_half = (tfds.Split.TRAIN + tfds.Split.TEST).subsplit(2)
81+
first_half, second_half = (tfds.Split.TRAIN + tfds.Split.TEST).subsplit(k=2)
8282
```
8383

8484
Note that a split cannot be added twice, and subsplitting can only happen once.
@@ -89,7 +89,7 @@ For example, these are invalid:
8989
split = tfds.Split.TRAIN.subsplit(tfds.percent[:25]) + tfds.Split.TRAIN
9090

9191
# INVALID! Subsplit of subsplit
92-
split = tfds.Split.TRAIN.subsplit(tfds.percent[0:25]).subsplit(2)
92+
split = tfds.Split.TRAIN.subsplit(tfds.percent[0:25]).subsplit(k=2)
9393

9494
# INVALID! Subsplit of subsplit
9595
split = (tfds.Split.TRAIN.subsplit(tfds.percent[:25]) +

0 commit comments

Comments
 (0)