Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Commit 5f2f31c

Browse files
ftynsenicolasvasilache
authored andcommitted
update documentation
Fix multiple typos, grammos and inconsistencies in the documentation.
1 parent 3476610 commit 5f2f31c

File tree

6 files changed

+37
-31
lines changed

6 files changed

+37
-31
lines changed

docs/source/framework/pytorch_integration/autograd_with_tc.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@ Autograd with TC
22
================
33

44
To create a :code:`torch.autograd` function backed by TC one can just use the
5-
:func:`make_autograd` helper function:
5+
:func:`make_autograd` helper function. Note that backward computations must be
6+
provided explicitly as TC functions.
67

78
.. code-block:: python
89

docs/source/framework/pytorch_integration/debugging.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
Debugging
22
=========
33

4-
We provide functions to enable debugging information, including the kernels
4+
We provide functions to enable the output of debugging information, including
5+
the kernels
56
that TC generates. If you are curious about what happens when a TC is compiled
6-
and run, you can use these function to enable logging:
7+
and run, you can use these functions to enable logging:
78

89
* :code:`dump_cuda`: print the generated cuda code
910

@@ -16,7 +17,9 @@ and run, you can use these function to enable logging:
1617
* :code:`debug_tuner`: print information logged by the autotuner.
1718

1819
The logging functionality is backed by Google's glog library.
19-
To activate logging to screen, call :code:`tc.logtostderr(debug)`.
20+
To activate logging to screen, call :code:`tc.logtostderr(True)`.
21+
Otherwise, the logs are printed into uniquely-named files in the default
22+
temporary directory.
2023

2124
Example usage
2225
-------------

docs/source/framework/pytorch_integration/frequently_asked_questions.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ This is expected because the autotuner is multithreaded and we don't
7878
enforce a strict order of evaluation: the best configurations
7979
from the previous generation may not be evaluated first in next generation.
8080
Furthermore, the other mutations in the current generation may perform worse
81-
than the last best knows configuration. Therefore the initial jump in best runtime at
81+
than the last best known configuration. Therefore the initial jump in best runtime at
8282
generation (i+1) is likely to appear temporarily.
8383

8484
I sometimes see fluctuations in the best kernel time, why?
@@ -89,5 +89,6 @@ runtime may be noisy. So a 10-20% variation is expected and normal.
8989
How do I stop autotuning early and save cache?
9090
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9191
You can send a SIGINT signal (i.e. hit :code:`Ctrl+C`) to stop the autotuning
92-
early. The current compilation and evaluation jobs will be flushed and no new
93-
jobs will be submitted.
92+
early. All compilations and evaluations in progress will be completed, but no
93+
new compilation or evaluation will be started. Therefore, stopping the
94+
autotuner may take some time.

docs/source/framework/pytorch_integration/getting_started.rst

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,23 +11,23 @@ layer with just a few lines of code (examples below).
1111
Here are a few cases where TC can be useful:
1212

1313
* specialize your layer for uncommon tensor sizes and get better performance
14-
than libraries *or*
14+
than libraries, *or*
1515

16-
* experiment with layer fusion like group convolution, ReLU, FC *or*
16+
* experiment with layer fusion like group convolution, ReLU, FC, *or*
1717

18-
* synthesize new layers and get an efficient kernel automatically *or*
18+
* synthesize new layers and get an efficient kernel automatically, *or*
1919

2020
* synthesize layers for tensors with unconventional memory layouts
2121

2222
TC makes it easy to synthesize CUDA kernels for such cases and more. By providing
23-
TC integration with PyTorch, we hope to make it further easy for PyTorch users
23+
TC integration with PyTorch, we hope to make it easy for PyTorch users
2424
to express their operations and bridge the gap between research and engineering.
2525

2626
Installation
2727
------------
2828

2929
We provide a :code:`conda` package for Tensor Comprehensions (only :code:`linux-64` package)
30-
to quickly get started with using TC. Follow the steps below to install TC :code:`conda` package:
30+
to quickly get started with TC. Follow the steps below to install TC :code:`conda` package:
3131

3232
**Step 1:** Setup Anaconda
3333
Make sure :code:`conda` bin is in your :code:`$PATH`. To verify, run the following command:
@@ -39,15 +39,15 @@ Make sure :code:`conda` bin is in your :code:`$PATH`. To verify, run the followi
3939
This command should print the path of your :code:`conda` bin. If it doesn't,
4040
please activate :code:`conda` (see `installation`_).
4141

42-
**Step 2:** Conda Install Tensor Comprehensions
42+
**Step 2:** Install Tensor Comprehensions with Anaconda
4343

4444
Now, go ahead and install Tensor Comprehensions by running following command.
4545

4646
.. code-block:: bash
4747
4848
$ conda install -y -c pytorch -c tensorcomp tensor_comprehensions
4949
50-
Now, you are ready to start using Tensor Comprehensions with PyTorch. As an example,
50+
You are now ready to start using Tensor Comprehensions with PyTorch. As an example,
5151
let's see a simple example of writing :code:`matmul` layer with TC in PyTorch.
5252

5353
Example
@@ -70,4 +70,5 @@ backed by TC.
7070
C = TC.matmul(A, B)
7171
7272
With a few lines of code, you can get a functional CUDA implementation for an
73-
operation expressed in TC. Read the documentation to find out more.
73+
operation expressed in TC. Note, however, that this simplest example is not
74+
expected to be fast. Read the documentation to find out more.

docs/source/framework/pytorch_integration/python_api.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Python API
66
High-level API
77
--------------
88

9-
We provide a high-level API which allows to easily experiment with Tensor
9+
We provide a high-level API which allows one to easily experiment with Tensor
1010
Comprehensions.
1111

1212
.. autofunction:: define
@@ -36,11 +36,11 @@ performing numerical checks.
3636

3737
.. autofunction:: assert_almost_equal
3838

39-
Python bindings
40-
---------------
39+
Caching and Configuration
40+
-------------------------
4141

42-
Finally we also document a subset of the Python bindings that are commonly
43-
used.
42+
Finally we also document a subset of the helper types for caching and
43+
configuration that are commonly used.
4444

4545
.. automodule:: tensor_comprehensions.tclib
4646

docs/source/framework/pytorch_integration/writing_layers.rst

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@ Writing TC operations
44
This document focuses on writing TC operations using the high-level API.
55
For examples of using the low-level API, see the Python API documentation.
66

7-
To create a CUDA kernel implementing an operation backed by TC, one can:
7+
To create a CUDA kernel implementing an operation backed by TC, one should:
88

9-
1. Create a callable helper object by calling :func:`define`
9+
1. Create a callable TC object by calling :func:`define`
1010
2. Create input PyTorch Tensors
11-
3. Call the helper object with on the input PyTorch Tensors
11+
3. Call the helper object with the input PyTorch Tensors
1212

1313
When running, the backend ensures the TC is compiled and memoized for the
1414
given input tensor sizes (see the documentation for :func:`define` for more detals).
15-
Calling the helper object returned by :func:`define` executes the
15+
Calling the object returned by :func:`define` executes the
1616
corresponding operation and returns a list of outputs.
1717
If the operation has already been compiled, in the following runs, the TC
1818
backend will reuse the memoized compilation result and run the operation
@@ -23,8 +23,8 @@ Example
2323

2424
The following example demonstrates the steps above.
2525
We use the :func:`make_naive_options_factory` builder function to provide
26-
naive :class:`MappingOptions`.
27-
At this time there is no notion of a default :class:`MappingOptions`.
26+
naive :class:`MappingOptions`. Naive options result in poor performance.
27+
At this time, there is no notion of a default :class:`MappingOptions`.
2828
Instead one should use the autotuner to perform an evolutionary search
2929
starting from an initial :class:`MappingOptions` object and return a better
3030
:class:`MappingOptions` object for a given TC function and sizes (more on this
@@ -48,13 +48,13 @@ below).
4848
4949
5050
Specifying MappingOptions
51-
-----------------------------
51+
-------------------------
5252

5353
There are three ways to construct :class:`MappingOptions` when defining a TC:
5454

5555
* **Naive MappingOptions**:
5656

57-
* :code:`naive`: this is provided to create a basic mapping strategy with
57+
* :code:`naive`: this is provided to create a basic GPU mapping strategy with
5858
3-D tiling by 32x32x32, mapping to 256x256 blocks 32x8 threads. This
5959
should by no means be considered a good baseline but just a point to
6060
get started using TC. Once a correct TC is written, we recommend either
@@ -137,16 +137,16 @@ Tuning behavior can be modified by defining the TC with an optional
137137
Fixed TC, varying input sizes
138138
-----------------------------
139139

140-
A TC definition can be reused but will trigger recompilatopm for different size
140+
A TC definition can be reused but will trigger recompilation for different size
141141
combinations. While we recommend tuning independently for each TC and input size
142142
variation, the best options found for a particular TC and input size
143143
combination may transfer well to another input size (especially if
144144
sizes are close and the kernels exhibit the same type of bottlenecs;
145145
i.e. memory-bound, latency-bound, instruction-issue-bound,
146146
compute-bound).
147147

148-
Fake-templating
149-
---------------
148+
Pseudo-templating
149+
-----------------
150150

151151
The TC mapper requires statically affine tensor indexing functions.
152152
Without getting into deeper details, the dependence analysis process is

0 commit comments

Comments
 (0)