Skip to content
This repository was archived by the owner on Aug 26, 2022. It is now read-only.

Commit a7ab69c

Browse files
authored
Merge pull request #33 from tunib-ai/kevin
merge changes from functorch upstream, fix docs
2 parents 3d7e792 + ecf16ed commit a7ab69c

File tree

16 files changed

+492
-54
lines changed

16 files changed

+492
-54
lines changed

docs/TUTORIALS/tensor_model_parallelism.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -486,11 +486,11 @@ <h3>1.1. Create model and tokenizer<a class="headerlink" href="#create-model-and
486486
<div class="section" id="parallelize-the-model">
487487
<h3>1.2. Parallelize the model<a class="headerlink" href="#parallelize-the-model" title="Permalink to this headline"></a></h3>
488488
<ul class="simple">
489-
<li><p><code class="docutils literal notranslate"><span class="pre">tensor_parallel_size</span></code> must be same or smaller than total number of gpus.</p></li>
489+
<li><p><code class="docutils literal notranslate"><span class="pre">tensor_parallel_size</span></code> must be same or smaller than total num of gpus.</p></li>
490490
<li><p><code class="docutils literal notranslate"><span class="pre">tensor_parallel_size</span></code> must be power of 2. (e.g. 2, 4, 8, 16, …)</p></li>
491491
<li><p><code class="docutils literal notranslate"><span class="pre">tensor_parallel_size</span></code> must be positive number.</p></li>
492-
<li><p><code class="docutils literal notranslate"><span class="pre">tensor_parallel_size</span></code> must be same or greater than hidden size</p></li>
493-
<li><p><code class="docutils literal notranslate"><span class="pre">tensor_parallel_size</span></code> must be same or greater than the number of heads</p></li>
492+
<li><p><code class="docutils literal notranslate"><span class="pre">hidden</span> <span class="pre">size</span></code> must be same or greater than <code class="docutils literal notranslate"><span class="pre">tensor_parallel_size</span></code></p></li>
493+
<li><p><code class="docutils literal notranslate"><span class="pre">the</span> <span class="pre">number</span> <span class="pre">of</span> <span class="pre">heads</span></code> must be same or greater than <code class="docutils literal notranslate"><span class="pre">tensor_parallel_size</span></code></p></li>
494494
</ul>
495495
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">oslo</span>
496496

docs/_sources/tutorials/tensor_model_parallelism.md.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,11 @@ tokenizer = AutoTokenizer.from_pretrained("gpt2")
4949
```
5050

5151
### 1.2. Parallelize the model
52-
- ``tensor_parallel_size`` must be same or smaller than total number of gpus.
52+
- ``tensor_parallel_size`` must be same or smaller than total num of gpus.
5353
- ``tensor_parallel_size`` must be power of 2. (e.g. 2, 4, 8, 16, ...)
5454
- ``tensor_parallel_size`` must be positive number.
55-
- ``tensor_parallel_size`` must be same or greater than hidden size
56-
- ``tensor_parallel_size`` must be same or greater than the number of heads
55+
- ``hidden size`` must be same or greater than ``tensor_parallel_size``
56+
- ``the number of heads`` must be same or greater than ``tensor_parallel_size``
5757

5858
```python
5959
import oslo

docs/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/source/TUTORIALS/tensor_model_parallelism.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,11 @@ tokenizer = AutoTokenizer.from_pretrained("gpt2")
4949
```
5050

5151
### 1.2. Parallelize the model
52-
- ``tensor_parallel_size`` must be same or smaller than total number of gpus.
52+
- ``tensor_parallel_size`` must be same or smaller than total num of gpus.
5353
- ``tensor_parallel_size`` must be power of 2. (e.g. 2, 4, 8, 16, ...)
5454
- ``tensor_parallel_size`` must be positive number.
55-
- ``tensor_parallel_size`` must be same or greater than hidden size
56-
- ``tensor_parallel_size`` must be same or greater than the number of heads
55+
- ``hidden size`` must be same or greater than ``tensor_parallel_size``
56+
- ``the number of heads`` must be same or greater than ``tensor_parallel_size``
5757

5858
```python
5959
import oslo

oslo/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
# Copyright 2021 TUNiB Inc.
22

3-
version = "2.0.0"
3+
version = "2.0.1"
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""
22
LAST UPSTREAM INFORMATION
33
4-
- date: 2022/02/14
5-
- commit: https://github.com/pytorch/functorch/commit/cd41d6ebc0402d94ae6af51f163ee728277a7aa4
4+
- date: 2022/02/21
5+
- commit: https://github.com/pytorch/functorch/commit/0c0f325ba3c83e70c215f231cfd810af68141767
66
"""

oslo/pytorch/kernel_fusion/mem_efficient/aot_autograd.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ def _reshape_alias(x, shape, strides):
110110
return aten.view(x, shape)
111111

112112

113+
113114
def create_aot_autograd_function(
114115
flat_fn, fw_compiler, bw_compiler, partition_fn, decompositions, grad_state
115116
):

oslo/pytorch/kernel_fusion/mem_efficient/compilers.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,16 @@
1313
)
1414

1515

16+
def _canonicalize(fx_g):
17+
for node in fx_g.graph.nodes:
18+
if node.target == torch.ops.aten._s_where:
19+
node.target = torch.ops.aten.where
20+
fx_g.recompile()
21+
return fx_g
22+
23+
1624
def ts_compile(fx_g, _):
17-
# print(fx_g.code)
25+
fx_g = _canonicalize(fx_g)
1826
for node in fx_g.graph.nodes:
1927
if node.target == torch.ops.aten.new_zeros:
2028
if node.args[1] == []:
@@ -215,6 +223,7 @@ def nop(f, _):
215223

216224

217225
def simple_ts_compile(fx_g, _):
226+
fx_g = _canonicalize(fx_g)
218227
f = torch.jit.script(fx_g)
219228
f = torch.jit.freeze(f.eval())
220229
return f
@@ -284,12 +293,13 @@ def debug_compile(fx_g, inps):
284293
##############################################################
285294
import torch
286295
import torch.fx as fx
287-
from torch.compile import minimizer, check_nvfuser_subprocess
296+
from functorch.compile import minifier, check_nvfuser_subprocess
288297
inps = {[(i.shape, i.dtype) for i in inps]}
298+
inps = [torch.ones(shape, dtype=dtype, device='cuda') for (shape, dtype) in inps]
289299
from foo import FxModule
290300
mod = FxModule().cuda()
291301
with torch.jit.fuser("fuser2"):
292-
minimizer(fx.symbolic_trace(mod), inps, check_nvfuser_subprocess)
302+
minifier(fx.symbolic_trace(mod), inps, check_nvfuser_subprocess)
293303
"""
294304
)
295305

0 commit comments

Comments
 (0)