Skip to content

Commit 38c8845

Browse files
committed
Merge branch 'master' of github.com:FluxML/FastAI.jl
2 parents 6db2ad2 + f41d62b commit 38c8845

36 files changed

+2524
-1581
lines changed

Project.toml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,21 +18,24 @@ FileTrees = "72696420-646e-6120-6e77-6f6420746567"
1818
FixedPointNumbers = "53c48c17-4a7d-5ca2-90c5-79b7896eea93"
1919
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
2020
FluxTraining = "7bf95e4d-ca32-48da-9824-f0dc5310474f"
21+
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
2122
LearnBase = "7f8f8fb0-2700-5f03-b4bd-41f8cfc144b6"
2223
MLDataPattern = "9920b226-0b2a-5f5f-9153-9aa70a013f8b"
2324
MosaicViews = "e94cdb99-869f-56ef-bcf0-1ae2bcbe0389"
2425
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
2526
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
27+
ShowCases = "605ecd9f-84a6-4c9e-81e2-4798472b76a3"
2628
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
2729
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
30+
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
2831
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
2932

3033
[compat]
31-
AbstractPlotting = "0.17"
34+
AbstractPlotting = "0.17, 0.18"
3235
Animations = "0.4"
3336
BSON = "0.3"
3437
Colors = "0.12"
35-
DLPipelines = "0.2"
38+
DLPipelines = "0.2.1"
3639
DataAugmentation = "0.2.2"
3740
DataDeps = "0.7"
3841
DataLoaders = "0.1"
@@ -41,7 +44,7 @@ FilePathsBase = "0.9"
4144
FileTrees = "0.3"
4245
FixedPointNumbers = "0.8"
4346
Flux = "0.12"
44-
FluxTraining = "0.1"
47+
FluxTraining = "0.1, 0.2"
4548
LearnBase = "0.3"
4649
MLDataPattern = "0.5"
4750
MosaicViews = "0.2"

docs/Manifest.toml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1207,6 +1207,12 @@ git-tree-sha1 = "30cd8c360c54081f806b1ee14d2eecbef3c04c49"
12071207
uuid = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
12081208
version = "0.9.8"
12091209

1210+
[[StringDistances]]
1211+
deps = ["Distances"]
1212+
git-tree-sha1 = "a4c05337dfe6c4963253939d2acbdfa5946e8e31"
1213+
uuid = "88034a9c-02f8-509d-84a9-84ec65e18404"
1214+
version = "0.10.0"
1215+
12101216
[[StructArrays]]
12111217
deps = ["Adapt", "DataAPI", "Tables"]
12121218
git-tree-sha1 = "44b3afd37b17422a62aea25f04c1f7e09ce6b07f"
@@ -1253,6 +1259,12 @@ version = "0.1.15"
12531259
deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
12541260
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
12551261

1262+
[[TestImages]]
1263+
deps = ["AxisArrays", "ColorTypes", "FileIO", "OffsetArrays", "Pkg", "StringDistances"]
1264+
git-tree-sha1 = "883a8dbc6500302e39a4a40f47dac475e46dd988"
1265+
uuid = "5e47fb64-e119-507b-a336-dd2b206d9990"
1266+
version = "1.5.0"
1267+
12561268
[[ThreadPools]]
12571269
deps = ["Printf", "RecipesBase", "Statistics"]
12581270
git-tree-sha1 = "705ccc29d575b87cceb359dfea19f4653d06df8f"

docs/Project.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ ImageIO = "82e4d734-157c-48bb-816b-45c225c6df19"
1010
ImageMagick = "6218d12a-5da1-5696-b52f-db25d2ecc6d1"
1111
ImageShow = "4e3cecfd-b093-5904-9786-8bbb286a6a31"
1212
Images = "916415d5-f1e6-5110-898d-aaa5f9f070e0"
13+
TestImages = "5e47fb64-e119-507b-a336-dd2b206d9990"

docs/background/datapipelines.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Performant data pipelines
22

3-
*Explainer on how data pipelines in FastAI.jl are made fast and how to make yours fast.*
3+
*Bottlenecks in data pipelines and how to measure and fix them*
44

55
When training large deep learning models on a GPU we clearly want wait as short as possible for the training to complete. The hardware bottleneck is usually the GPU power you have available to you. This means that data pipelines need to be fast enough to keep the GPU at 100% utilization, that is, keep it from "starving". Reducing the time the GPU has to wait for the next batch of data directly lowers the training time until the GPU is fully utilized. There are other ways to reduce training time like using hyperparameter schedules and different optimizers for faster convergence, but we'll only talk about improving GPU utilization here.
66

@@ -127,25 +127,25 @@ So, you've identified the data pipeline as a performance bottleneck. What now? B
127127

128128
If the data loading is still slowing down training, you'll probably have to speed up the loading of each observation. As mentioned above, this can be broken down into observation loading and encoding. The exact strategy will depend on your use case, but here are some examples.
129129

130-
- Reduce loading time of image datasets by presizing
130+
### Reduce loading time of image datasets by presizing
131131

132-
For many computer vision tasks, you will resize and crop images to a specific size during training for GPU performance reasons. If the images themselves are large, loading them from disk itself can take some time. If your dataset consists of 1920x1080 resolution images but you're resizing them to 256x256 during training, you're wasting a lot of time loading the large images. *Presizing* means saving resized versions of each image to disk once, and then loading these smaller versions during training. We can see the performance difference using ImageNette since it comes in 3 sizes: original, 360px and 180px.
132+
For many computer vision tasks, you will resize and crop images to a specific size during training for GPU performance reasons. If the images themselves are large, loading them from disk itself can take some time. If your dataset consists of 1920x1080 resolution images but you're resizing them to 256x256 during training, you're wasting a lot of time loading the large images. *Presizing* means saving resized versions of each image to disk once, and then loading these smaller versions during training. We can see the performance difference using ImageNette since it comes in 3 sizes: original, 360px and 180px.
133133

134-
```julia
135-
data_orig = loadtaskdata(datasetpath("imagenette2"), ImageClassificationTask)
136-
@time eachobsparallel(data_orig, buffered = false)
134+
```julia
135+
data_orig = loadtaskdata(datasetpath("imagenette2"), ImageClassificationTask)
136+
@time eachobsparallel(data_orig, buffered = false)
137137

138-
data_320px = loadtaskdata(datasetpath("imagenette2-320"), ImageClassificationTask)
139-
@time eachobsparallel(data_320px, buffered = false)
138+
data_320px = loadtaskdata(datasetpath("imagenette2-320"), ImageClassificationTask)
139+
@time eachobsparallel(data_320px, buffered = false)
140140

141-
data_160px = loadtaskdata(datasetpath("imagenette2-160"), ImageClassificationTask)
142-
@time eachobsparallel(data_160px, buffered = false)
143-
```
141+
data_160px = loadtaskdata(datasetpath("imagenette2-160"), ImageClassificationTask)
142+
@time eachobsparallel(data_160px, buffered = false)
143+
```
144144

145-
- Reducing allocations with inplace operations
145+
### Reducing allocations with inplace operations
146146

147-
When implementing the `LearningMethod` interface, you have the option to implement `encode!(buf, method, context, sample)`, an inplace version of `encode` that reuses a buffer to avoid allocations. Reducing allocations often speeds up the encoding step and can also reduce the frequency of garbage collector pauses during training which can reduce GPU utilization.
147+
When implementing the `LearningMethod` interface, you have the option to implement `encode!(buf, method, context, sample)`, an inplace version of `encode` that reuses a buffer to avoid allocations. Reducing allocations often speeds up the encoding step and can also reduce the frequency of garbage collector pauses during training which can reduce GPU utilization.
148148

149-
- Using efficient data augmentation
149+
### Using efficient data augmentation
150150

151-
Many kinds of augmentation can be composed efficiently. A prime example of this are image transformations like resizing, scaling and cropping which are powered by [DataAugmentation.jl](https://github.com/lorenzoh/DataAugmentation.jl). See [its documentation](https://lorenzoh.github.io/DataAugmentation.jl/dev/docs/literate/intro.html) to find out how to implement efficient, composable data transformations.
151+
Many kinds of augmentation can be composed efficiently. A prime example of this are image transformations like resizing, scaling and cropping which are powered by [DataAugmentation.jl](https://github.com/lorenzoh/DataAugmentation.jl). See [its documentation](https://lorenzoh.github.io/DataAugmentation.jl/dev/docs/literate/intro.html) to find out how to implement efficient, composable data transformations.

docs/howto/augmentvision.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ function showfig(f)
1212
end
1313
```
1414

15-
{cell=main, result=false}
15+
{cell=main result=false output=false}
1616
```julia
1717
using FastAI
1818
using CairoMakie
@@ -52,4 +52,8 @@ method3 = ImageClassification(
5252
aug_projection=augs_projection(), aug_image=augs_lighting())
5353
xs3, ys3 = FastAI.makebatch(method3, data, fill(4, 9))
5454
f = FastAI.plotbatch(method3, xs3, ys3)
55-
```
55+
```
56+
57+
## Augmentation in custom learning methods
58+
59+
To use projective and image augmentations in custom learning methods for computer vision tasks, see [`ProjectiveTransforms`](#) and [`ImagePreprocessing`](#), two helpers that every vision method in FastAI.jl uses.

docs/methods/imageclassification.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,17 @@ When doing image classification, we want to train a model to classify a given im
44

55
## Single-label classification
66

7-
In the simple case, every image will have one class from a list associated with it. For example, the Cats&Dogs dataset contains pictures of cats and dogs (duh). . The learning method [`ImageClassification`](#) handles single-label image classification. Let's load some samples and visualize them:
7+
In the simple case, every image will have one class from a list associated with it. For example, the Cats&Dogs dataset contains pictures of cats and dogs. The learning method [`ImageClassification`](#) handles single-label image classification. Let's load some samples and visualize them:
88

9-
{cell=main}
9+
{cell=main output=false result=false style="display:none;"}
10+
```julia
11+
using Images: load
12+
function showfig(f)
13+
save("fig.png", f)
14+
load("fig.png")
15+
end
16+
```
17+
{cell=main result=false output=false}
1018
```julia
1119
using CairoMakie
1220
using FastAI
@@ -15,7 +23,11 @@ data = loadtaskdata(dir, ImageClassificationTask)
1523
samples = [getobs(data, i) for i in rand(1:nobs(data), 9)]
1624
classes = Datasets.getclassesclassification(dir)
1725
method = ImageClassification(classes, (128, 128))
18-
plotsamples(method, samples)
26+
f = plotsamples(method, samples)
27+
```
28+
{cell=main output=false style="display:none;"}
29+
```julia
30+
showfig(f)
1931
```
2032

2133
With a method and a data container, we can easily construct a [`Learner`](#):

0 commit comments

Comments
 (0)