You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 28, 2023. It is now read-only.
Tensor Comprehensions (TC) is a fully-functional C++ library to *automatically* synthesize high-performance machine learning kernels using [Halide](https://github.com/halide/Halide), [ISL](http://isl.gforge.inria.fr/) and NVRTC or LLVM. TC additionally provides basic integration with Caffe2 and pybind11 bindings for use with python.
3
+
Tensor Comprehensions (TC) is a fully-functional C++ library to *automatically* synthesize high-performance machine learning kernels using [Halide](https://github.com/halide/Halide), [ISL](http://isl.gforge.inria.fr/) and NVRTC or LLVM. TC additionally provides basic integration with Caffe2 and pybind11 bindings for use with python. We provide more details in our paper on [arXiv](https://arxiv.org/abs/1802.04730).
4
4
5
5
This library is designed to be highly portable, machine-learning-framework agnostic and only requires a simple tensor library with memory allocation, offloading and synchronization capabilities.
6
6
@@ -12,38 +12,29 @@ The following illustrates a short but powerful feature of the library: the capac
12
12
13
13
```cpp
14
14
#include<ATen/ATen.h>
15
-
16
15
#include"tc/aten/aten_compiler.h"
17
16
#include"tc/core/mapping_options.h"
18
17
19
-
// 1. Define and setup the TC compilation unit with CUDA memory
20
-
// management backed by ATen tensors.
18
+
// 1. Define and setup the TC compilation unit with CUDA memory management backed by ATen.
@@ -55,37 +46,19 @@ After a few generations of autotuning on a 2-GPU P100 system, we see results res
55
46
56
47
We have not yet characterized the precise fraction of peak performance we obtain but it is not uncommon to obtain 80%+ of peak shared memory bandwidth after autotuning. Solid register-level optimizations are still in the work but TC in its current form already addresses the productivity gap between the needs of research and the needs of production. Which is why we are excited to share it with the entire community and bring this collaborative effort in the open.
57
48
58
-
# Documentation, Environment and Prerequisites
59
-
We provide pre-built docker images in the docker subdirectory, they can be downloaded from [dockerhub](https://hub.docker.com/u/tensorcomprehensions/). We use and support those images as part of our continuous integration. Note that we can cross-compile CUDA (but not execute) even if the machine has no physical GPUs. In any case the CUDA toolkit and libraries should always be installed, for now.
60
-
61
-
To get started, see the [docs](master/docs) directory.
62
-
63
-
# Preparing the source
49
+
# Installation / Documentation
50
+
You can find documentation [here](https://facebookresearch.github.io/TensorComprehensions/) which contains instructions for building TC via docker, conda packages or in non-conda environment.
* **Slack**: For discussion around framework integration, build support, collaboration, etc. join our slack channel https://tensorcomprehensions.slack.com. You may need an invitation to join, contact us by email at tensorcomp@fb.com to get one.
0 commit comments