1
1
# Splits
2
2
3
3
All ` DatasetBuilder ` s expose various data subsets defined as
4
- [ ` tfds.Split ` s] ( api_docs/python/tfds/Split.md )
5
- (typically ` tfds.Split.TRAIN ` and ` tfds.Split.TEST ` ). A given dataset's
6
- splits are defined in
4
+ [ ` tfds.Split ` ] ( api_docs/python/tfds/Split.md ) s (typically ` tfds.Split.TRAIN ` and
5
+ ` tfds.Split.TEST ` ). A given dataset's splits are defined in
7
6
[ ` tfds.DatasetBuilder.info.splits ` ] ( api_docs/python/tfds/core/DatasetBuilder.md#info )
8
- and are accessible through
9
- [ ` tfds.load ` ] ( api_docs/python/tfds/load.md )
10
- and
7
+ and are accessible through [ ` tfds.load ` ] ( api_docs/python/tfds/load.md ) and
11
8
[ ` tfds.DatasetBuilder.as_dataset ` ] ( api_docs/python/tfds/core/DatasetBuilder.md#as_dataset ) ,
12
9
both of which take ` split= ` as a keyword argument.
13
10
@@ -27,7 +24,7 @@ Note that a special `tfds.Split.ALL` keyword exists to merge all splits
27
24
together:
28
25
29
26
``` py
30
- # Ds will iterate over test, train and validation merged together
27
+ # `ds` will iterate over test, train and validation merged together
31
28
ds = tfds.load(" mnist" , split = tfds.Split.ALL )
32
29
```
33
30
@@ -36,17 +33,17 @@ ds = tfds.load("mnist", split=tfds.Split.ALL)
36
33
You have 3 options for how to get a thinner slice of the data than the
37
34
base splits, all based on ` tfds.Split.subsplit ` .
38
35
39
- * Warning* : TFDS does not currently guarantee the order of the data on disk when
40
- data is generated, so if you regenerate the data, the subsplits may no longer be
41
- the same.
36
+ * Warning* : TensorFlow Datasets does not currently guarantee the order of the
37
+ data on disk when data is generated. Therefore, if you regenerate the data, the
38
+ subsplits may no longer be the same.
42
39
43
40
* Warning* : If the ` total_number_examples % 100 != 0 ` , then remainder examples
44
41
may not be evenly distributed among subsplits.
45
42
46
43
### Specify number of subsplits
47
44
48
45
``` py
49
- train_half_1, train_half_2 = tfds.Split.TRAIN .subsplit(2 )
46
+ train_half_1, train_half_2 = tfds.Split.TRAIN .subsplit(k = 2 )
50
47
51
48
dataset = tfds.load(" mnist" , split = train_half_1)
52
49
```
@@ -64,7 +61,7 @@ dataset = tfds.load("mnist", split=middle_50_percent)
64
61
### Specifying weights
65
62
66
63
``` py
67
- half, quarter1, quarter2 = tfds.Split.TRAIN .subsplit([2 , 1 , 1 ])
64
+ half, quarter1, quarter2 = tfds.Split.TRAIN .subsplit(weighted = [2 , 1 , 1 ])
68
65
69
66
dataset = tfds.load(" mnist" , split = half)
70
67
```
@@ -78,7 +75,7 @@ It's possible to compose the above operations:
78
75
split = tfds.Split.TRAIN .subsplit(tfds.percent[:50 ]) + tfds.Split.TEST
79
76
80
77
# Split the combined TRAIN and TEST splits into 2
81
- first_half, second_half = (tfds.Split.TRAIN + tfds.Split.TEST ).subsplit(2 )
78
+ first_half, second_half = (tfds.Split.TRAIN + tfds.Split.TEST ).subsplit(k = 2 )
82
79
```
83
80
84
81
Note that a split cannot be added twice, and subsplitting can only happen once.
@@ -89,7 +86,7 @@ For example, these are invalid:
89
86
split = tfds.Split.TRAIN .subsplit(tfds.percent[:25 ]) + tfds.Split.TRAIN
90
87
91
88
# INVALID! Subsplit of subsplit
92
- split = tfds.Split.TRAIN .subsplit(tfds.percent[0 :25 ]).subsplit(2 )
89
+ split = tfds.Split.TRAIN .subsplit(tfds.percent[0 :25 ]).subsplit(k = 2 )
93
90
94
91
# INVALID! Subsplit of subsplit
95
92
split = (tfds.Split.TRAIN .subsplit(tfds.percent[:25 ]) +
0 commit comments