1
1
# Splits
2
2
3
3
All ` DatasetBuilder ` s expose various data subsets defined as
4
- [ ` tfds.Split ` s ] ( api_docs/python/tfds/Split.md )
4
+ [ ` tfds.Split ` ] ( api_docs/python/tfds/Split.md ) s
5
5
(typically ` tfds.Split.TRAIN ` and ` tfds.Split.TEST ` ). A given dataset's
6
6
splits are defined in
7
7
[ ` tfds.DatasetBuilder.info.splits ` ] ( api_docs/python/tfds/core/DatasetBuilder.md#info )
@@ -27,7 +27,7 @@ Note that a special `tfds.Split.ALL` keyword exists to merge all splits
27
27
together:
28
28
29
29
``` py
30
- # Ds will iterate over test, train and validation merged together
30
+ # `ds` will iterate over test, train and validation merged together
31
31
ds = tfds.load(" mnist" , split = tfds.Split.ALL )
32
32
```
33
33
@@ -36,8 +36,8 @@ ds = tfds.load("mnist", split=tfds.Split.ALL)
36
36
You have 3 options for how to get a thinner slice of the data than the
37
37
base splits, all based on ` tfds.Split.subsplit ` .
38
38
39
- * Warning* : TFDS does not currently guarantee the order of the data on disk when
40
- data is generated, so if you regenerate the data, the subsplits may no longer be
39
+ * Warning* : TensorFlow Datasets does not currently guarantee the order of the data on disk when
40
+ data is generated. Therefore, if you regenerate the data, the subsplits may no longer be
41
41
the same.
42
42
43
43
* Warning* : If the ` total_number_examples % 100 != 0 ` , then remainder examples
@@ -46,7 +46,7 @@ may not be evenly distributed among subsplits.
46
46
### Specify number of subsplits
47
47
48
48
``` py
49
- train_half_1, train_half_2 = tfds.Split.TRAIN .subsplit(2 )
49
+ train_half_1, train_half_2 = tfds.Split.TRAIN .subsplit(k = 2 )
50
50
51
51
dataset = tfds.load(" mnist" , split = train_half_1)
52
52
```
@@ -64,7 +64,7 @@ dataset = tfds.load("mnist", split=middle_50_percent)
64
64
### Specifying weights
65
65
66
66
``` py
67
- half, quarter1, quarter2 = tfds.Split.TRAIN .subsplit([2 , 1 , 1 ])
67
+ half, quarter1, quarter2 = tfds.Split.TRAIN .subsplit(weighted = [2 , 1 , 1 ])
68
68
69
69
dataset = tfds.load(" mnist" , split = half)
70
70
```
@@ -78,7 +78,7 @@ It's possible to compose the above operations:
78
78
split = tfds.Split.TRAIN .subsplit(tfds.percent[:50 ]) + tfds.Split.TEST
79
79
80
80
# Split the combined TRAIN and TEST splits into 2
81
- first_half, second_half = (tfds.Split.TRAIN + tfds.Split.TEST ).subsplit(2 )
81
+ first_half, second_half = (tfds.Split.TRAIN + tfds.Split.TEST ).subsplit(k = 2 )
82
82
```
83
83
84
84
Note that a split cannot be added twice, and subsplitting can only happen once.
@@ -89,7 +89,7 @@ For example, these are invalid:
89
89
split = tfds.Split.TRAIN .subsplit(tfds.percent[:25 ]) + tfds.Split.TRAIN
90
90
91
91
# INVALID! Subsplit of subsplit
92
- split = tfds.Split.TRAIN .subsplit(tfds.percent[0 :25 ]).subsplit(2 )
92
+ split = tfds.Split.TRAIN .subsplit(tfds.percent[0 :25 ]).subsplit(k = 2 )
93
93
94
94
# INVALID! Subsplit of subsplit
95
95
split = (tfds.Split.TRAIN .subsplit(tfds.percent[:25 ]) +
0 commit comments