Skip to content
This repository was archived by the owner on Aug 26, 2022. It is now read-only.

Commit 3d7e792

Browse files
authored
Merge pull request #31 from tunib-ai/kevin
fix docs link
2 parents e1c85e2 + 6ae40a5 commit 3d7e792

File tree

3 files changed

+21
-21
lines changed

3 files changed

+21
-21
lines changed

docs/TUTORIALS/kernel_fusion.html

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -455,18 +455,18 @@ <h2>Table of contents<a class="headerlink" href="#table-of-contents" title="Perm
455455
<ul>
456456
<li><p><a class="reference external" href="#limitation">2.1. Limitation</a></p></li>
457457
<li><p><a class="reference external" href="#fuse-kernels-with-aot-autograd">2.2. Fuse kernels with AOT Autograd</a></p></li>
458-
<li><p><a class="reference external" href="#warm-up-compiling">2.3. Warm-up (compiling)</a></p></li>
459-
<li><p><a class="reference external" href="#benchmark">2.4. Benchmark</a></p></li>
458+
<li><p><a class="reference external" href="#id1">2.3. Warm-up (compiling)</a></p></li>
459+
<li><p><a class="reference external" href="#id2">2.4. Benchmark</a></p></li>
460460
</ul>
461461
</li>
462462
<li><p><a class="reference external" href="#custom-cuda-kernels">3. Custom CUDA kernels</a></p>
463463
<ul>
464464
<li><p><a class="reference external" href="#supported-kernels">3.1. Supported kernels</a></p></li>
465-
<li><p><a class="reference external" href="#initialize-input-tensor">3.2. Initialize input tensor</a></p></li>
466-
<li><p><a class="reference external" href="#create-models-for-benchmarking">3.3. Create models for benchmarking</a></p></li>
465+
<li><p><a class="reference external" href="#id3">3.2. Initialize input tensor</a></p></li>
466+
<li><p><a class="reference external" href="#id4">3.3. Create models for benchmarking</a></p></li>
467467
<li><p><a class="reference external" href="#34-fuse-kernels-with-the-custom-cuda-kernels">3.4. Fuse kernels with the custom CUDA kernels</a></p></li>
468-
<li><p><a class="reference external" href="#warm-up-compiling">3.5. Warm-up (compiling)</a></p></li>
469-
<li><p><a class="reference external" href="#benchmark">3.6. Benchmark</a></p></li>
468+
<li><p><a class="reference external" href="#id5">3.5. Warm-up (compiling)</a></p></li>
469+
<li><p><a class="reference external" href="#id6">3.6. Benchmark</a></p></li>
470470
</ul>
471471
</li>
472472
</ul>
@@ -544,7 +544,7 @@ <h3>1.5. Benchmark<a class="headerlink" href="#benchmark" title="Permalink to th
544544
<div class="section" id="memory-efficient-fusion">
545545
<h2>2. Memory efficient fusion<a class="headerlink" href="#memory-efficient-fusion" title="Permalink to this headline"></a></h2>
546546
<p>How to use the memory efficient fusion?</p>
547-
<p>The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch
547+
<p>The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch.
548548
The AOT Autograd fuses all fusible areas of the model and also optimizes the backward graph with a novel mechanism called <a class="reference external" href="https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467">min-cut rematerialization</a>.
549549
Because the backward graph can be optimized, the memory efficient fusion shows a huge performance boost in training rather than inference.</p>
550550
<p>However, the AOT Autograd is still under development, so unexpected bugs could be occurred.

docs/_sources/tutorials/kernel_fusion.md.txt

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@
1616
* [2. Memory efficient fusion](#memory-efficient-fusion)
1717
+ [2.1. Limitation](#limitation)
1818
+ [2.2. Fuse kernels with AOT Autograd](#fuse-kernels-with-aot-autograd)
19-
+ [2.3. Warm-up (compiling)](#warm-up-compiling)
20-
+ [2.4. Benchmark](#benchmark)
19+
+ [2.3. Warm-up (compiling)](#id1)
20+
+ [2.4. Benchmark](#id2)
2121
* [3. Custom CUDA kernels](#custom-cuda-kernels)
2222
+ [3.1. Supported kernels](#supported-kernels)
23-
+ [3.2. Initialize input tensor](#initialize-input-tensor)
24-
+ [3.3. Create models for benchmarking](#create-models-for-benchmarking)
23+
+ [3.2. Initialize input tensor](#id3)
24+
+ [3.3. Create models for benchmarking](#id4)
2525
+ [3.4. Fuse kernels with the custom CUDA kernels](#34-fuse-kernels-with-the-custom-cuda-kernels)
26-
+ [3.5. Warm-up (compiling)](#warm-up-compiling)
27-
+ [3.6. Benchmark](#benchmark)
26+
+ [3.5. Warm-up (compiling)](#id5)
27+
+ [3.6. Benchmark](#id6)
2828

2929
## 1. JIT based fusion
3030
How to use the JIT based fusion?
@@ -96,7 +96,7 @@ oslo: 0.20798110961914062
9696
## 2. Memory efficient fusion
9797
How to use the memory efficient fusion?
9898

99-
The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch
99+
The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch.
100100
The AOT Autograd fuses all fusible areas of the model and also optimizes the backward graph with a novel mechanism called [min-cut rematerialization](https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467).
101101
Because the backward graph can be optimized, the memory efficient fusion shows a huge performance boost in training rather than inference.
102102

docs/source/TUTORIALS/kernel_fusion.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@
1616
* [2. Memory efficient fusion](#memory-efficient-fusion)
1717
+ [2.1. Limitation](#limitation)
1818
+ [2.2. Fuse kernels with AOT Autograd](#fuse-kernels-with-aot-autograd)
19-
+ [2.3. Warm-up (compiling)](#warm-up-compiling)
20-
+ [2.4. Benchmark](#benchmark)
19+
+ [2.3. Warm-up (compiling)](#id1)
20+
+ [2.4. Benchmark](#id2)
2121
* [3. Custom CUDA kernels](#custom-cuda-kernels)
2222
+ [3.1. Supported kernels](#supported-kernels)
23-
+ [3.2. Initialize input tensor](#initialize-input-tensor)
24-
+ [3.3. Create models for benchmarking](#create-models-for-benchmarking)
23+
+ [3.2. Initialize input tensor](#id3)
24+
+ [3.3. Create models for benchmarking](#id4)
2525
+ [3.4. Fuse kernels with the custom CUDA kernels](#34-fuse-kernels-with-the-custom-cuda-kernels)
26-
+ [3.5. Warm-up (compiling)](#warm-up-compiling)
27-
+ [3.6. Benchmark](#benchmark)
26+
+ [3.5. Warm-up (compiling)](#id5)
27+
+ [3.6. Benchmark](#id6)
2828

2929
## 1. JIT based fusion
3030
How to use the JIT based fusion?
@@ -96,7 +96,7 @@ oslo: 0.20798110961914062
9696
## 2. Memory efficient fusion
9797
How to use the memory efficient fusion?
9898

99-
The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch
99+
The memory efficient fusion is a kernel fusion mechanism that uses the AOT Autograd engine, a novel engine developed by the functorch team at PyTorch.
100100
The AOT Autograd fuses all fusible areas of the model and also optimizes the backward graph with a novel mechanism called [min-cut rematerialization](https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467).
101101
Because the backward graph can be optimized, the memory efficient fusion shows a huge performance boost in training rather than inference.
102102

0 commit comments

Comments
 (0)