You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 26, 2022. It is now read-only.
<h2>2. Memory efficient fusion<aclass="headerlink" href="#memory-efficient-fusion" title="Permalink to this headline"></a></h2>
546
546
<p>How to use the memory efficient fusion?</p>
547
-
<p>The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch
547
+
<p>The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch.
548
548
The AOT Autograd fuses all fusible areas of the model and also optimizes the backward graph with a novel mechanism called <aclass="reference external" href="https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467">min-cut rematerialization</a>.
549
549
Because the backward graph can be optimized, the memory efficient fusion shows a huge performance boost in training rather than inference.</p>
550
550
<p>However, the AOT Autograd is still under development, so unexpected bugs could be occurred.
+ [3.3. Create models for benchmarking](#create-models-for-benchmarking)
23
+
+ [3.2. Initialize input tensor](#id3)
24
+
+ [3.3. Create models for benchmarking](#id4)
25
25
+ [3.4. Fuse kernels with the custom CUDA kernels](#34-fuse-kernels-with-the-custom-cuda-kernels)
26
-
+ [3.5. Warm-up (compiling)](#warm-up-compiling)
27
-
+ [3.6. Benchmark](#benchmark)
26
+
+ [3.5. Warm-up (compiling)](#id5)
27
+
+ [3.6. Benchmark](#id6)
28
28
29
29
## 1. JIT based fusion
30
30
How to use the JIT based fusion?
@@ -96,7 +96,7 @@ oslo: 0.20798110961914062
96
96
## 2. Memory efficient fusion
97
97
How to use the memory efficient fusion?
98
98
99
-
The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch
99
+
The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch.
100
100
The AOT Autograd fuses all fusible areas of the model and also optimizes the backward graph with a novel mechanism called [min-cut rematerialization](https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467).
101
101
Because the backward graph can be optimized, the memory efficient fusion shows a huge performance boost in training rather than inference.
+[3.3. Create models for benchmarking](#create-models-for-benchmarking)
23
+
+[3.2. Initialize input tensor](#id3)
24
+
+[3.3. Create models for benchmarking](#id4)
25
25
+[3.4. Fuse kernels with the custom CUDA kernels](#34-fuse-kernels-with-the-custom-cuda-kernels)
26
-
+[3.5. Warm-up (compiling)](#warm-up-compiling)
27
-
+[3.6. Benchmark](#benchmark)
26
+
+[3.5. Warm-up (compiling)](#id5)
27
+
+[3.6. Benchmark](#id6)
28
28
29
29
## 1. JIT based fusion
30
30
How to use the JIT based fusion?
@@ -96,7 +96,7 @@ oslo: 0.20798110961914062
96
96
## 2. Memory efficient fusion
97
97
How to use the memory efficient fusion?
98
98
99
-
The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch
99
+
The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch.
100
100
The AOT Autograd fuses all fusible areas of the model and also optimizes the backward graph with a novel mechanism called [min-cut rematerialization](https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467).
101
101
Because the backward graph can be optimized, the memory efficient fusion shows a huge performance boost in training rather than inference.
0 commit comments