Merge pull request #31 from tunib-ai/kevin

hyunwoongko · web-flow · commit 3d7e7923e849 · 2022-02-17T03:09:03.000+09:00
fix docs link
diff --git a/docs/TUTORIALS/kernel_fusion.html b/docs/TUTORIALS/kernel_fusion.html
@@ -455,18 +455,18 @@ <h2>Table of contents<a class="headerlink" href="#table-of-contents" title="Perm
 <ul>
 <li><p><a class="reference external" href="#limitation">2.1. Limitation</a></p></li>
 <li><p><a class="reference external" href="#fuse-kernels-with-aot-autograd">2.2. Fuse kernels with AOT Autograd</a></p></li>
-<li><p><a class="reference external" href="#warm-up-compiling">2.3. Warm-up (compiling)</a></p></li>
-<li><p><a class="reference external" href="#benchmark">2.4. Benchmark</a></p></li>
+<li><p><a class="reference external" href="#id1">2.3. Warm-up (compiling)</a></p></li>
+<li><p><a class="reference external" href="#id2">2.4. Benchmark</a></p></li>
 </ul>
 </li>
 <li><p><a class="reference external" href="#custom-cuda-kernels">3. Custom CUDA kernels</a></p>
 <ul>
 <li><p><a class="reference external" href="#supported-kernels">3.1. Supported kernels</a></p></li>
-<li><p><a class="reference external" href="#initialize-input-tensor">3.2. Initialize input tensor</a></p></li>
-<li><p><a class="reference external" href="#create-models-for-benchmarking">3.3. Create models for benchmarking</a></p></li>
+<li><p><a class="reference external" href="#id3">3.2. Initialize input tensor</a></p></li>
+<li><p><a class="reference external" href="#id4">3.3. Create models for benchmarking</a></p></li>
 <li><p><a class="reference external" href="#34-fuse-kernels-with-the-custom-cuda-kernels">3.4. Fuse kernels with the custom CUDA kernels</a></p></li>
-<li><p><a class="reference external" href="#warm-up-compiling">3.5. Warm-up (compiling)</a></p></li>
-<li><p><a class="reference external" href="#benchmark">3.6. Benchmark</a></p></li>
+<li><p><a class="reference external" href="#id5">3.5. Warm-up (compiling)</a></p></li>
+<li><p><a class="reference external" href="#id6">3.6. Benchmark</a></p></li>
 </ul>
 </li>
 </ul>
@@ -544,7 +544,7 @@ <h3>1.5. Benchmark<a class="headerlink" href="#benchmark" title="Permalink to th
 <div class="section" id="memory-efficient-fusion">
 <h2>2. Memory efficient fusion<a class="headerlink" href="#memory-efficient-fusion" title="Permalink to this headline"></a></h2>
 <p>How to use the memory efficient fusion?</p>
-<p>The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch
+<p>The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch.
 The AOT Autograd fuses all fusible areas of the model and also optimizes the backward graph with a novel mechanism called <a class="reference external" href="https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467">min-cut rematerialization</a>.
 Because the backward graph can be optimized, the memory efficient fusion shows a huge performance boost in training rather than inference.</p>
 <p>However, the AOT Autograd is still under development, so unexpected bugs could be occurred.
diff --git a/docs/_sources/tutorials/kernel_fusion.md.txt b/docs/_sources/tutorials/kernel_fusion.md.txt
@@ -16,15 +16,15 @@
 * [2. Memory efficient fusion](#memory-efficient-fusion)
     + [2.1. Limitation](#limitation)
     + [2.2. Fuse kernels with AOT Autograd](#fuse-kernels-with-aot-autograd)
-    + [2.3. Warm-up (compiling)](#warm-up-compiling)
-    + [2.4. Benchmark](#benchmark)
+    + [2.3. Warm-up (compiling)](#id1)
+    + [2.4. Benchmark](#id2)
 * [3. Custom CUDA kernels](#custom-cuda-kernels)
     + [3.1. Supported kernels](#supported-kernels)
-    + [3.2. Initialize input tensor](#initialize-input-tensor)
-    + [3.3. Create models for benchmarking](#create-models-for-benchmarking)
+    + [3.2. Initialize input tensor](#id3)
+    + [3.3. Create models for benchmarking](#id4)
     + [3.4. Fuse kernels with the custom CUDA kernels](#34-fuse-kernels-with-the-custom-cuda-kernels)
-    + [3.5. Warm-up (compiling)](#warm-up-compiling)
-    + [3.6. Benchmark](#benchmark)
+    + [3.5. Warm-up (compiling)](#id5)
+    + [3.6. Benchmark](#id6)
 
 ## 1. JIT based fusion
 How to use the JIT based fusion?
@@ -96,7 +96,7 @@ oslo: 0.20798110961914062
 ## 2. Memory efficient fusion
 How to use the memory efficient fusion?
 
-The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch
+The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch.
 The AOT Autograd fuses all fusible areas of the model and also optimizes the backward graph with a novel mechanism called [min-cut rematerialization](https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467). 
 Because the backward graph can be optimized, the memory efficient fusion shows a huge performance boost in training rather than inference.
 
diff --git a/docs/source/TUTORIALS/kernel_fusion.md b/docs/source/TUTORIALS/kernel_fusion.md
@@ -16,15 +16,15 @@
 * [2. Memory efficient fusion](#memory-efficient-fusion)
     + [2.1. Limitation](#limitation)
     + [2.2. Fuse kernels with AOT Autograd](#fuse-kernels-with-aot-autograd)
-    + [2.3. Warm-up (compiling)](#warm-up-compiling)
-    + [2.4. Benchmark](#benchmark)
+    + [2.3. Warm-up (compiling)](#id1)
+    + [2.4. Benchmark](#id2)
 * [3. Custom CUDA kernels](#custom-cuda-kernels)
     + [3.1. Supported kernels](#supported-kernels)
-    + [3.2. Initialize input tensor](#initialize-input-tensor)
-    + [3.3. Create models for benchmarking](#create-models-for-benchmarking)
+    + [3.2. Initialize input tensor](#id3)
+    + [3.3. Create models for benchmarking](#id4)
     + [3.4. Fuse kernels with the custom CUDA kernels](#34-fuse-kernels-with-the-custom-cuda-kernels)
-    + [3.5. Warm-up (compiling)](#warm-up-compiling)
-    + [3.6. Benchmark](#benchmark)
+    + [3.5. Warm-up (compiling)](#id5)
+    + [3.6. Benchmark](#id6)
 
 ## 1. JIT based fusion
 How to use the JIT based fusion?
@@ -96,7 +96,7 @@ oslo: 0.20798110961914062
 ## 2. Memory efficient fusion
 How to use the memory efficient fusion?
 
-The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch
+The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch.
 The AOT Autograd fuses all fusible areas of the model and also optimizes the backward graph with a novel mechanism called [min-cut rematerialization](https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467). 
 Because the backward graph can be optimized, the memory efficient fusion shows a huge performance boost in training rather than inference.