update documentation

ftynse · nicolasvasilache · commit 5f2f31c64ab2 · 2018-06-29T09:49:14.000-07:00
Fix multiple typos, grammos and inconsistencies in the documentation.
diff --git a/docs/source/framework/pytorch_integration/autograd_with_tc.rst b/docs/source/framework/pytorch_integration/autograd_with_tc.rst
@@ -2,7 +2,8 @@ Autograd with TC
 ================
 
 To create a :code:`torch.autograd` function backed by TC one can just use the
-:func:`make_autograd` helper function:
+:func:`make_autograd` helper function.  Note that backward computations must be
+provided explicitly as TC functions.
 
     .. code-block:: python
 
diff --git a/docs/source/framework/pytorch_integration/debugging.rst b/docs/source/framework/pytorch_integration/debugging.rst
@@ -1,9 +1,10 @@
 Debugging
 =========
 
-We provide functions to enable debugging information, including the kernels
+We provide functions to enable the output of debugging information, including
+the kernels
 that TC generates. If you are curious about what happens when a TC is compiled
-and run, you can use these function to enable logging:
+and run, you can use these functions to enable logging:
 
 * :code:`dump_cuda`: print the generated cuda code
 
@@ -16,7 +17,9 @@ and run, you can use these function to enable logging:
 * :code:`debug_tuner`: print information logged by the autotuner.
 
 The logging functionality is backed by Google's glog library.
-To activate logging to screen, call :code:`tc.logtostderr(debug)`.
+To activate logging to screen, call :code:`tc.logtostderr(True)`.
+Otherwise, the logs are printed into uniquely-named files in the default
+temporary directory.
 
 Example usage
 -------------
diff --git a/docs/source/framework/pytorch_integration/frequently_asked_questions.rst b/docs/source/framework/pytorch_integration/frequently_asked_questions.rst
@@ -78,7 +78,7 @@ This is expected because the autotuner is multithreaded and we don't
 enforce a strict order of evaluation: the best configurations
 from the previous generation may not be evaluated first in next generation.
 Furthermore, the other mutations in the current generation may perform worse
-than the last best knows configuration. Therefore the initial jump in best runtime at
+than the last best known configuration. Therefore the initial jump in best runtime at
 generation (i+1) is likely to appear temporarily.
 
 I sometimes see fluctuations in the best kernel time, why?
@@ -89,5 +89,6 @@ runtime may be noisy. So a 10-20% variation is expected and normal.
 How do I stop autotuning early and save cache?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 You can send a SIGINT signal (i.e. hit :code:`Ctrl+C`) to stop the autotuning
-early. The current compilation and evaluation jobs will be flushed and no new
-jobs will be submitted.
+early. All compilations and evaluations in progress will be completed, but no
+new compilation or evaluation will be started.  Therefore, stopping the
+autotuner may take some time.
diff --git a/docs/source/framework/pytorch_integration/getting_started.rst b/docs/source/framework/pytorch_integration/getting_started.rst
@@ -11,23 +11,23 @@ layer with just a few lines of code (examples below).
 Here are a few cases where TC can be useful:
 
 * specialize your layer for uncommon tensor sizes and get better performance
-  than libraries *or*
+  than libraries, *or*
 
-* experiment with layer fusion like group convolution, ReLU, FC *or*
+* experiment with layer fusion like group convolution, ReLU, FC, *or*
 
-* synthesize new layers and get an efficient kernel automatically *or*
+* synthesize new layers and get an efficient kernel automatically, *or*
 
 * synthesize layers for tensors with unconventional memory layouts
 
 TC makes it easy to synthesize CUDA kernels for such cases and more. By providing
-TC integration with PyTorch, we hope to make it further easy for PyTorch users
+TC integration with PyTorch, we hope to make it easy for PyTorch users
 to express their operations and bridge the gap between research and engineering.
 
 Installation
 ------------
 
 We provide a :code:`conda` package for Tensor Comprehensions (only :code:`linux-64` package)
-to quickly get started with using TC. Follow the steps below to install TC :code:`conda` package:
+to quickly get started with TC. Follow the steps below to install TC :code:`conda` package:
 
 **Step 1:** Setup Anaconda
 Make sure :code:`conda` bin is in your :code:`$PATH`. To verify, run the following command:
@@ -39,15 +39,15 @@ Make sure :code:`conda` bin is in your :code:`$PATH`. To verify, run the followi
 This command should print the path of your :code:`conda` bin. If it doesn't,
 please activate :code:`conda` (see `installation`_).
 
-**Step 2:** Conda Install Tensor Comprehensions
+**Step 2:** Install Tensor Comprehensions with Anaconda
 
 Now, go ahead and install Tensor Comprehensions by running following command.
 
 .. code-block:: bash
 
       $ conda install -y -c pytorch -c tensorcomp tensor_comprehensions
 
-Now, you are ready to start using Tensor Comprehensions with PyTorch. As an example,
+You are now ready to start using Tensor Comprehensions with PyTorch. As an example,
 let's see a simple example of writing :code:`matmul` layer with TC in PyTorch.
 
 Example
@@ -70,4 +70,5 @@ backed by TC.
     C = TC.matmul(A, B)
 
 With a few lines of code, you can get a functional CUDA implementation for an
-operation expressed in TC. Read the documentation to find out more.
+operation expressed in TC. Note, however, that this simplest example is not
+expected to be fast. Read the documentation to find out more.
diff --git a/docs/source/framework/pytorch_integration/python_api.rst b/docs/source/framework/pytorch_integration/python_api.rst
@@ -6,7 +6,7 @@ Python API
 High-level API
 --------------
 
-We provide a high-level API which allows to easily experiment with Tensor
+We provide a high-level API which allows one to easily experiment with Tensor
 Comprehensions.
 
 .. autofunction:: define
@@ -36,11 +36,11 @@ performing numerical checks.
 
 .. autofunction:: assert_almost_equal
 
-Python bindings
----------------
+Caching and Configuration
+-------------------------
 
-Finally we also document a subset of the Python bindings that are commonly
-used.
+Finally we also document a subset of the helper types for caching and
+configuration that are commonly used.
 
 .. automodule:: tensor_comprehensions.tclib
 
diff --git a/docs/source/framework/pytorch_integration/writing_layers.rst b/docs/source/framework/pytorch_integration/writing_layers.rst
@@ -4,15 +4,15 @@ Writing TC operations
 This document focuses on writing TC operations using the high-level API.
 For examples of using the low-level API, see the Python API documentation.
 
-To create a CUDA kernel implementing an operation backed by TC, one can:
+To create a CUDA kernel implementing an operation backed by TC, one should:
 
-1. Create a callable helper object by calling :func:`define`
+1. Create a callable TC object by calling :func:`define`
 2. Create input PyTorch Tensors
-3. Call the helper object with on the input PyTorch Tensors
+3. Call the helper object with the input PyTorch Tensors
 
 When running, the backend ensures the TC is compiled and memoized for the
 given input tensor sizes (see the documentation for :func:`define` for more detals).
-Calling the helper object returned by :func:`define` executes the
+Calling the object returned by :func:`define` executes the
 corresponding operation and returns a list of outputs.
 If the operation has already been compiled, in the following runs, the TC
 backend will reuse the memoized compilation result and run the operation
@@ -23,8 +23,8 @@ Example
 
 The following example demonstrates the steps above.
 We use the :func:`make_naive_options_factory` builder function to provide
-naive :class:`MappingOptions`.
-At this time there is no notion of a default :class:`MappingOptions`.
+naive :class:`MappingOptions`.  Naive options result in poor performance.
+At this time, there is no notion of a default :class:`MappingOptions`.
 Instead one should use the autotuner to perform an evolutionary search
 starting from an initial :class:`MappingOptions` object and return a better
 :class:`MappingOptions` object for a given TC function and sizes (more on this
@@ -48,13 +48,13 @@ below).
 
 
 Specifying MappingOptions
------------------------------
+-------------------------
 
 There are three ways to construct :class:`MappingOptions` when defining a TC:
 
 * **Naive MappingOptions**:
 
-  * :code:`naive`: this is provided to create a basic mapping strategy with
+  * :code:`naive`: this is provided to create a basic GPU mapping strategy with
     3-D tiling by 32x32x32, mapping to 256x256 blocks 32x8 threads. This
     should by no means be considered a good baseline but just a point to
     get started using TC. Once a correct TC is written, we recommend either
@@ -137,16 +137,16 @@ Tuning behavior can be modified by defining the TC with an optional
 Fixed TC, varying input sizes
 -----------------------------
 
-A TC definition can be reused but will trigger recompilatopm for different size
+A TC definition can be reused but will trigger recompilation for different size
 combinations. While we recommend tuning independently for each TC and input size
 variation, the best options found for a particular TC and input size
 combination may transfer well to another input size (especially if
 sizes are close and the kernels exhibit the same type of bottlenecs;
 i.e. memory-bound, latency-bound, instruction-issue-bound,
 compute-bound).
 
-Fake-templating
----------------
+Pseudo-templating
+-----------------
 
 The TC mapper requires statically affine tensor indexing functions.
 Without getting into deeper details, the dependence analysis process is