|
| 1 | + |
| 2 | + |
| 3 | +<div align="center"> |
| 4 | + <img width="500px" src="doc/images/KernelTuner-logo.png"/> |
| 5 | +</div> |
| 6 | + |
| 7 | +--- |
| 8 | +[](https://github.com/KernelTuner/kernel_tuner/actions/workflows/build-test-python-package.yml) |
| 9 | +[](https://codecov.io/gh/KernelTuner/kernel_tuner) |
| 10 | +[](https://pypi.python.org/pypi/kernel_tuner/) |
| 11 | +[](https://zenodo.org/badge/latestdoi/54894320) |
| 12 | +[](https://sonarcloud.io/dashboard?id=KernelTuner_kernel_tuner) |
| 13 | +[](https://bestpractices.coreinfrastructure.org/projects/6573) |
| 14 | +[](https://fair-software.eu) |
| 15 | +--- |
| 16 | + |
| 17 | + |
| 18 | +Create optimized GPU applications in any mainstream GPU |
| 19 | +programming language (CUDA, HIP, OpenCL, OpenACC). |
| 20 | + |
| 21 | +What Kernel Tuner does: |
| 22 | + |
| 23 | +- Works as an external tool to benchmark and optimize GPU kernels in isolation |
| 24 | +- Can be used directly on existing kernel code without extensive changes |
| 25 | +- Can be used with applications in any host programming language |
| 26 | +- Blazing fast search space construction |
| 27 | +- More than 20 [optimization algorithms](https://kerneltuner.github.io/kernel_tuner/stable/optimization.html) to speedup tuning |
| 28 | +- Energy measurements and optimizations [(power capping, clock frequency tuning)](https://arxiv.org/abs/2211.07260) |
| 29 | +- ... and much more! For example, [caching](https://kerneltuner.github.io/kernel_tuner/stable/cache_files.html), [output verification](https://kerneltuner.github.io/kernel_tuner/stable/correctness.html), [tuning host and device code](https://kerneltuner.github.io/kernel_tuner/stable/hostcode.html), [user defined metrics](https://kerneltuner.github.io/kernel_tuner/stable/metrics.html), see [the full documentation](https://kerneltuner.github.io/kernel_tuner/stable/index.html). |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +## Installation |
| 34 | + |
| 35 | +- First, make sure you have your [CUDA](https://kerneltuner.github.io/kernel_tuner/stable/install.html#cuda-and-pycuda), [OpenCL](https://kerneltuner.github.io/kernel_tuner/stable/install.html#opencl-and-pyopencl), or [HIP](https://kerneltuner.github.io/kernel_tuner/stable/install.html#hip-and-pyhipl) compiler installed |
| 36 | +- Then type: `pip install kernel_tuner[cuda]`, `pip install kernel_tuner[opencl]`, or `pip install kernel_tuner[hip]` |
| 37 | +- or why not all of them: `pip install kernel_tuner[cuda,opencl,hip]` |
| 38 | + |
| 39 | +More information on installation, also for other languages, in the [installation guide](http://kerneltuner.github.io/kernel_tuner/stable/install.html). |
| 40 | + |
| 41 | +## Example |
| 42 | + |
| 43 | +```python |
| 44 | +import numpy as np |
| 45 | +from kernel_tuner import tune_kernel |
| 46 | + |
| 47 | +kernel_string = """ |
| 48 | +__global__ void vector_add(float *c, float *a, float *b, int n) { |
| 49 | + int i = blockIdx.x * block_size_x + threadIdx.x; |
| 50 | + if (i<n) { |
| 51 | + c[i] = a[i] + b[i]; |
| 52 | + } |
| 53 | +} |
| 54 | +""" |
| 55 | + |
| 56 | +n = np.int32(10000000) |
| 57 | + |
| 58 | +a = np.random.randn(n).astype(np.float32) |
| 59 | +b = np.random.randn(n).astype(np.float32) |
| 60 | +c = np.zeros_like(a) |
| 61 | + |
| 62 | +args = [c, a, b, n] |
| 63 | + |
| 64 | +tune_params = {"block_size_x": [32, 64, 128, 256, 512]} |
| 65 | + |
| 66 | +tune_kernel("vector_add", kernel_string, n, args, tune_params) |
| 67 | +``` |
| 68 | + |
| 69 | +More [examples here](https://kerneltuner.github.io/kernel_tuner/stable/examples.html). |
| 70 | + |
| 71 | +## Resources |
| 72 | + |
| 73 | +- [Full documentation](https://kerneltuner.github.io/kernel_tuner/stable/) |
| 74 | +- Guides: |
| 75 | + - [Getting Started](https://kerneltuner.github.io/kernel_tuner/stable/quickstart.html) |
| 76 | + - [Convolution](https://kerneltuner.github.io/kernel_tuner/stable/convolution.html) |
| 77 | + - [Diffusion](https://kerneltuner.github.io/kernel_tuner/stable/diffusion.html) |
| 78 | + - [Matrix Multiplication](https://kerneltuner.github.io/kernel_tuner/stable/matrix_multiplication.html) |
| 79 | +- Features & Use cases: |
| 80 | + - [Full list of examples](https://kerneltuner.github.io/kernel_tuner/stable/examples.html) |
| 81 | + - [Output verification](https://kerneltuner.github.io/kernel_tuner/stable/correctness.html) |
| 82 | + - [Test GPU code from Python](https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/test_vector_add.py) |
| 83 | + - [Tune code in both host and device code](https://kerneltuner.github.io/kernel_tuner/stable/hostcode.html) |
| 84 | + - [Optimization algorithms](https://kerneltuner.github.io/kernel_tuner/stable/optimization.html) |
| 85 | + - [Mixed-precision & Accuracy tuning](https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/accuracy.py) |
| 86 | + - [Custom metrics & tuning objectives](https://kerneltuner.github.io/kernel_tuner/stable/metrics.html) |
| 87 | +- **Kernel Tuner Tutorial** slides [[PDF]](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/slides/2022_SURF/SURF22-Kernel-Tuner-Tutorial.pdf), hands-on: |
| 88 | + - Vector add example [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/00_Kernel_Tuner_Introduction.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/00_Kernel_Tuner_Introduction.ipynb) |
| 89 | + - Tuning thread block dimensions [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/01_Kernel_Tuner_Getting_Started.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/01_Kernel_Tuner_Getting_Started.ipynb) |
| 90 | + - Search space restrictions & output verification [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/02_Kernel_Tuner_Intermediate.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/02_Kernel_Tuner_Intermediate.ipynb) |
| 91 | + - Visualization & search space optimization [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/03_Kernel_Tuner_Advanced.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/03_Kernel_Tuner_Advanced.ipynb) |
| 92 | +- **Energy Efficient GPU Computing** tutorial slides [[PDF]](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/slides/2023_Supercomputing/SC23.pdf), hands-on: |
| 93 | + - Kernel Tuner for GPU energy measurements [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/00_Kernel_Tuner_Introduction.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/00_Kernel_Tuner_Introduction.ipynb) |
| 94 | + - Code optimizations for energy [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/01_Code_Optimizations_for_Energy.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/01_Code_Optimizations_for_Energy.ipynb) |
| 95 | + - Mixed precision and accuracy tuning [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/02_Mixed_precision_programming.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/02_Mixed_precision_programming.ipynb) |
| 96 | + - Optimzing for time vs for energy [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/03_energy_efficient_computing.ipynb)] [](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/03_energy_efficient_computing.ipynb) |
| 97 | + |
| 98 | + |
| 99 | +## Kernel Tuner ecosystem |
| 100 | + |
| 101 | +<img width="250px" src="doc/images/kernel_launcher.png"/><br />C++ magic to integrate auto-tuned kernels into C++ applications |
| 102 | + |
| 103 | +<img width="250px" src="doc/images/kernel_float.png"/><br />C++ data types for mixed-precision CUDA kernel programming |
| 104 | + |
| 105 | +<img width="275px" src="doc/images/kernel_dashboard.png"/><br />Monitor, analyze, and visualize auto-tuning runs |
| 106 | + |
| 107 | + |
| 108 | +## Communication & Contribution |
| 109 | + |
| 110 | +- GitHub [Issues](https://github.com/KernelTuner/kernel_tuner/issues): Bug reports, install issues, feature requests, work in progress |
| 111 | +- GitHub [Discussion group](https://github.com/orgs/KernelTuner/discussions): General questions, Q&A, thoughts |
| 112 | + |
| 113 | +Contributions are welcome! For feature requests, bug reports, or usage problems, please feel free to create an issue. |
| 114 | +For more extensive contributions, check the [contribution guide](http://kerneltuner.github.io/kernel_tuner/stable/contributing.html). |
| 115 | + |
| 116 | +## Citation |
| 117 | + |
| 118 | +If you use Kernel Tuner in research or research software, please cite the most relevant among the [publications on Kernel |
| 119 | +Tuner](https://kerneltuner.github.io/kernel_tuner/stable/#citation). To refer to the project as a whole, please cite: |
| 120 | + |
| 121 | +```latex |
| 122 | +@article{kerneltuner, |
| 123 | + author = {Ben van Werkhoven}, |
| 124 | + title = {Kernel Tuner: A search-optimizing GPU code auto-tuner}, |
| 125 | + journal = {Future Generation Computer Systems}, |
| 126 | + year = {2019}, |
| 127 | + volume = {90}, |
| 128 | + pages = {347-358}, |
| 129 | + url = {https://www.sciencedirect.com/science/article/pii/S0167739X18313359}, |
| 130 | + doi = {https://doi.org/10.1016/j.future.2018.08.004} |
| 131 | +} |
| 132 | +``` |
| 133 | + |
0 commit comments