1
1
Writing TC operations
2
2
=====================
3
3
4
+ .. automodule :: tensor_comprehensions
5
+
4
6
This document focuses on writing TC operations using the high-level API.
5
7
For examples of using the low-level API, see the Python API documentation.
6
8
7
9
To create a CUDA kernel implementing an operation backed by TC, one should:
8
10
9
11
1. Create a callable TC object by calling :func: `define `
10
12
2. Create input PyTorch Tensors
11
- 3. Call the helper object with the input PyTorch Tensors
13
+ 3. Call the TC object with the input PyTorch Tensors
12
14
13
15
When running, the backend ensures the TC is compiled and memoized for the
14
- given input tensor sizes (see the documentation for :func: `define ` for more detals ).
16
+ given input tensor sizes (see the documentation for :func: `define ` for more details ).
15
17
Calling the object returned by :func: `define ` executes the
16
18
corresponding operation and returns a list of outputs.
17
19
If the operation has already been compiled, in the following runs, the TC
@@ -23,11 +25,11 @@ Example
23
25
24
26
The following example demonstrates the steps above.
25
27
We use the :func: `make_naive_options_factory ` builder function to provide
26
- naive :class: `MappingOptions `. Naive options result in poor performance.
27
- At this time, there is no notion of a default :class: `MappingOptions `.
28
+ naive :class: `~tclib. MappingOptions `. Naive options result in poor performance.
29
+ At this time, there is no notion of a default :class: `~tclib. MappingOptions `.
28
30
Instead one should use the autotuner to perform an evolutionary search
29
- starting from an initial :class: `MappingOptions ` object and return a better
30
- :class: `MappingOptions ` object for a given TC function and sizes (more on this
31
+ starting from an initial :class: `~tclib. MappingOptions ` object and return a better
32
+ :class: `~tclib. MappingOptions ` object for a given TC function and sizes (more on this
31
33
below).
32
34
33
35
.. code-block :: python
@@ -50,19 +52,19 @@ below).
50
52
Specifying MappingOptions
51
53
-------------------------
52
54
53
- There are three ways to construct :class: `MappingOptions ` when defining a TC:
55
+ There are three ways to construct :class: `~tclib. MappingOptions ` when defining a TC:
54
56
55
57
* **Naive MappingOptions **:
56
58
57
59
* :code: `naive `: this is provided to create a basic GPU mapping strategy with
58
60
3-D tiling by 32x32x32, mapping to 256x256 blocks 32x8 threads. This
59
61
should by no means be considered a good baseline but just a point to
60
62
get started using TC. Once a correct TC is written, we recommend either
61
- using options loaded from a :class: `MappingOptionsCache ` or resulting from
62
- a tuning run. One can also modify a :class: `MappingOptions ` object
63
+ using options loaded from a :class: `~tclib. MappingOptionsCache ` or resulting from
64
+ a tuning run. One can also modify a :class: `~tclib. MappingOptions ` object
63
65
programmatically (see the API documentation).
64
66
65
- * **Loading from MappingOptionsCache **: a :class: `MappingOptionsCache ` provides
67
+ * **Loading from MappingOptionsCache **: a :class: `~tclib. MappingOptionsCache ` provides
66
68
a simple interface to load the best options from a previous tuning run.
67
69
68
70
* **Autotuning **: A kernel can be autotuned for fixed input tensor sizes.
@@ -73,7 +75,7 @@ There are three ways to construct :class:`MappingOptions` when defining a TC:
73
75
Loading from cache
74
76
------------------
75
77
76
- Loading the best options from a previously serialized :class: `MappingOptionsCache `
78
+ Loading the best options from a previously serialized :class: `~tclib. MappingOptionsCache `
77
79
can be achieved by making a factory function with
78
80
:func: `make_load_from_cache_options_factory ` and passing it as an argument to the
79
81
:func: `define ` function:
@@ -91,7 +93,7 @@ can be achieved by making a factory function with
91
93
torch.randn(G, D, device = ' cuda' ))
92
94
Sum, SumSq, O = T.group_normalization(I, gamma, beta)
93
95
94
- One can also use the low-level :class: `MappingOptionsCache `.
96
+ One can also use the low-level :class: `~tclib. MappingOptionsCache `.
95
97
96
98
Autotuning
97
99
----------
@@ -121,10 +123,10 @@ Tuning can be achieved by making a factory function with
121
123
that case, the compilation and evaluation jobs currently in flight will
122
124
be flushed, but no new compilation job will be created. Once the jobs in
123
125
flight are flushed, saving to cache occurs (if requested) and the best
124
- :class: `MappingOptions ` found so far will be returned.
126
+ :class: `~tclib. MappingOptions ` found so far will be returned.
125
127
126
128
Tuning behavior can be modified by defining the TC with an optional
127
- :class: `TunerConfig ` parameter constructed as such:
129
+ :class: `~tclib. TunerConfig ` parameter constructed as such:
128
130
:code: `tuner_config=tc.TunerConfig().threads(5).generations(3).pop_size(5) `.
129
131
130
132
.. note ::
0 commit comments