|
1 | 1 | # Performant data pipelines
|
2 | 2 |
|
3 |
| -*Explainer on how data pipelines in FastAI.jl are made fast and how to make yours fast.* |
| 3 | +*Bottlenecks in data pipelines and how to measure and fix them* |
4 | 4 |
|
5 | 5 | When training large deep learning models on a GPU we clearly want wait as short as possible for the training to complete. The hardware bottleneck is usually the GPU power you have available to you. This means that data pipelines need to be fast enough to keep the GPU at 100% utilization, that is, keep it from "starving". Reducing the time the GPU has to wait for the next batch of data directly lowers the training time until the GPU is fully utilized. There are other ways to reduce training time like using hyperparameter schedules and different optimizers for faster convergence, but we'll only talk about improving GPU utilization here.
|
6 | 6 |
|
@@ -127,25 +127,25 @@ So, you've identified the data pipeline as a performance bottleneck. What now? B
|
127 | 127 |
|
128 | 128 | If the data loading is still slowing down training, you'll probably have to speed up the loading of each observation. As mentioned above, this can be broken down into observation loading and encoding. The exact strategy will depend on your use case, but here are some examples.
|
129 | 129 |
|
130 |
| -- Reduce loading time of image datasets by presizing |
| 130 | +### Reduce loading time of image datasets by presizing |
131 | 131 |
|
132 |
| - For many computer vision tasks, you will resize and crop images to a specific size during training for GPU performance reasons. If the images themselves are large, loading them from disk itself can take some time. If your dataset consists of 1920x1080 resolution images but you're resizing them to 256x256 during training, you're wasting a lot of time loading the large images. *Presizing* means saving resized versions of each image to disk once, and then loading these smaller versions during training. We can see the performance difference using ImageNette since it comes in 3 sizes: original, 360px and 180px. |
| 132 | +For many computer vision tasks, you will resize and crop images to a specific size during training for GPU performance reasons. If the images themselves are large, loading them from disk itself can take some time. If your dataset consists of 1920x1080 resolution images but you're resizing them to 256x256 during training, you're wasting a lot of time loading the large images. *Presizing* means saving resized versions of each image to disk once, and then loading these smaller versions during training. We can see the performance difference using ImageNette since it comes in 3 sizes: original, 360px and 180px. |
133 | 133 |
|
134 |
| - ```julia |
135 |
| - data_orig = loadtaskdata(datasetpath("imagenette2"), ImageClassificationTask) |
136 |
| - @time eachobsparallel(data_orig, buffered = false) |
| 134 | +```julia |
| 135 | +data_orig = loadtaskdata(datasetpath("imagenette2"), ImageClassificationTask) |
| 136 | +@time eachobsparallel(data_orig, buffered = false) |
137 | 137 |
|
138 |
| - data_320px = loadtaskdata(datasetpath("imagenette2-320"), ImageClassificationTask) |
139 |
| - @time eachobsparallel(data_320px, buffered = false) |
| 138 | +data_320px = loadtaskdata(datasetpath("imagenette2-320"), ImageClassificationTask) |
| 139 | +@time eachobsparallel(data_320px, buffered = false) |
140 | 140 |
|
141 |
| - data_160px = loadtaskdata(datasetpath("imagenette2-160"), ImageClassificationTask) |
142 |
| - @time eachobsparallel(data_160px, buffered = false) |
143 |
| - ``` |
| 141 | +data_160px = loadtaskdata(datasetpath("imagenette2-160"), ImageClassificationTask) |
| 142 | +@time eachobsparallel(data_160px, buffered = false) |
| 143 | +``` |
144 | 144 |
|
145 |
| -- Reducing allocations with inplace operations |
| 145 | +### Reducing allocations with inplace operations |
146 | 146 |
|
147 |
| - When implementing the `LearningMethod` interface, you have the option to implement `encode!(buf, method, context, sample)`, an inplace version of `encode` that reuses a buffer to avoid allocations. Reducing allocations often speeds up the encoding step and can also reduce the frequency of garbage collector pauses during training which can reduce GPU utilization. |
| 147 | +When implementing the `LearningMethod` interface, you have the option to implement `encode!(buf, method, context, sample)`, an inplace version of `encode` that reuses a buffer to avoid allocations. Reducing allocations often speeds up the encoding step and can also reduce the frequency of garbage collector pauses during training which can reduce GPU utilization. |
148 | 148 |
|
149 |
| -- Using efficient data augmentation |
| 149 | +### Using efficient data augmentation |
150 | 150 |
|
151 |
| - Many kinds of augmentation can be composed efficiently. A prime example of this are image transformations like resizing, scaling and cropping which are powered by [DataAugmentation.jl](https://github.com/lorenzoh/DataAugmentation.jl). See [its documentation](https://lorenzoh.github.io/DataAugmentation.jl/dev/docs/literate/intro.html) to find out how to implement efficient, composable data transformations. |
| 151 | +Many kinds of augmentation can be composed efficiently. A prime example of this are image transformations like resizing, scaling and cropping which are powered by [DataAugmentation.jl](https://github.com/lorenzoh/DataAugmentation.jl). See [its documentation](https://lorenzoh.github.io/DataAugmentation.jl/dev/docs/literate/intro.html) to find out how to implement efficient, composable data transformations. |
0 commit comments