Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Commit 2448f68

Browse files
Move and update benchmarks README.md
1 parent a44f1d3 commit 2448f68

File tree

2 files changed

+40
-65
lines changed

2 files changed

+40
-65
lines changed

tc/benchmarks/README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Building
2+
3+
These benchmarks are automatically built when ```WITH_CAFFE2=ON``` is passed.
4+
If you have been following the instructions given [here](https://facebookresearch.github.io/TensorComprehensions/installation.html), you can use the command:
5+
6+
```
7+
BUILD_TYPE=Release WITH_CAFFE2=ON CLANG_PREFIX=$(${CONDA_PREFIX}/bin/llvm-config --prefix) ./build.sh
8+
```
9+
10+
# Running the autotuner manually
11+
By default a full evolutionary search is run with 25 generations and 100 candidates per generation. This will take some time for some of the kernels. This setting be changed by using the proper gflags options: ```--tuner_gen_generations``` and ```--tuner_gen_pop_size```.
12+
13+
For instance, a shorter tuning search could iterate as such:
14+
```
15+
./build/tc/benchmarks/benchmark_batchmatmul --autotune=true --tuner_gen_generations=10 --tuner_gen_pop_size=20
16+
```
17+
18+
When running manually, the number of CPU compilation threads and GPUs used for evaluation can be controlled via gflags
19+
```--tuner_threads``` and ```--tuner_devices```
20+
21+
For instance, on a 4 GPU system with 20 threads:
22+
```
23+
./build/tc/benchmarks/benchmark_batchmatmul --autotune=true --tuner_gen_generations=10 --tuner_gen_pop_size=10 --tuner_threads=20 --tuner_devices="0,1,2,3"
24+
```
25+
26+
# Running the autotuner with provided scripts
27+
These examples are run as part of ```test.sh``` but can also be run with a full autotuning run
28+
29+
If you are the lucky owner of a supercomputer with ```slurm``` and ```sbatch``` you can run:
30+
```
31+
sbatch --array=1-40 ./tc/benchmarks/scripts/autotuner_parallel.sh
32+
```
33+
34+
Results and logs will show in the subdir ```tc/benchmarks/results_xxx```, one can tail the ```*.INFO``` to obtain the best performance found by the autuner.
35+
36+
To control the CPU compilation threads and the GPUs used for evaluation, please use the environment variables ```TUNER_THREADS``` and ```TUNER_GPUS```.
37+
For instance, on a 4 GPU machine:
38+
```
39+
for f in $(seq 1 14); do TUNER_THREADS=20 TUNER_GPUS="0,1,2,3" SLURM_ARRAY_JOB_ID=local SLURM_ARRAY_TASK_ID=$f ./tc/benchmarks/scripts/autotuner_parallel.sh ; done
40+
```

tc/examples/README.md

Lines changed: 0 additions & 65 deletions
This file was deleted.

0 commit comments

Comments
 (0)