Skip to content

Commit 40ed73f

Browse files
committed
2 parents b13f972 + 3f32fed commit 40ed73f

27 files changed

+1264
-490
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ All notable changes to this project will be documented in this file.
33
This project adheres to [Semantic Versioning](http://semver.org/).
44

55
## Unreleased
6+
- changed HIP python bindings from pyhip-interface to the official hip-python
67

78
## [1.0.0] - 2024-04-04
89
- HIP backend to support tuning HIP kernels on AMD GPUs

INSTALL.rst

Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -124,31 +124,26 @@ Or you could install Kernel Tuner and PyOpenCL together if you haven't done so a
124124
125125
If this fails, please see the PyOpenCL installation guide (https://wiki.tiker.net/PyOpenCL/Installation)
126126

127-
HIP and PyHIP
127+
HIP and HIP Python
128128
-------------
129129

130-
Before we can install PyHIP, you'll need to have the HIP runtime and compiler installed on your system.
130+
Before we can install HIP Python, you'll need to have the HIP runtime and compiler installed on your system.
131131
The HIP compiler is included as part of the ROCm software stack. Here is AMD's installation guide:
132132

133133
* `ROCm Documentation: HIP Installation Guide <https://docs.amd.com/bundle/HIP-Installation-Guide-v5.3/page/Introduction_to_HIP_Installation_Guide.html>`__
134134

135-
After you've installed HIP, you will need to install PyHIP. Run the following command in your terminal to install:
135+
After you've installed HIP, you will need to install HIP Python. Run the following command in your terminal to install:
136136

137-
.. code-block:: bash
138-
139-
pip install pyhip-interface
137+
First identify the first three digits of the version number of your ROCm™ installation.
138+
Then install the HIP Python package(s) as follows:
140139

141-
Alternatively, you can install PyHIP from the source code. First, clone the repository from GitHub:
142-
143-
.. code-block:: bash
140+
.. code-block:: shell
144141
145-
git clone https://github.com/jatinx/PyHIP
146-
147-
Then, navigate to the repository directory and run the following command to install:
148-
149-
.. code-block:: bash
142+
python3 -m pip install -i https://test.pypi.org/simple hip-python~=$rocm_version
143+
# if you want to install the CUDA Python interoperability package too, run:
144+
python3 -m pip install -i https://test.pypi.org/simple hip-python-as-cuda~=$rocm_version
150145
151-
python setup.py install
146+
For other installation options check `hip-python on GitHub <https://github.com/ROCm/hip-python>`_
152147

153148
Installing the git version
154149
--------------------------
@@ -171,7 +166,7 @@ The runtime dependencies are:
171166

172167
- `cuda`: install pycuda along with kernel_tuner
173168
- `opencl`: install pycuda along with kernel_tuner
174-
- `hip`: install pyhip along with kernel_tuner
169+
- `hip`: install HIP Python along with kernel_tuner
175170
- `tutorial`: install packages required to run the guides
176171

177172
These can be installed by appending e.g. ``-E cuda -E opencl -E hip``.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ What Kernel Tuner does:
3232

3333
## Installation
3434

35-
- First, make sure you have your [CUDA](https://kerneltuner.github.io/kernel_tuner/stable/install.html#cuda-and-pycuda), [OpenCL](https://kerneltuner.github.io/kernel_tuner/stable/install.html#opencl-and-pyopencl), or [HIP](https://kerneltuner.github.io/kernel_tuner/stable/install.html#hip-and-pyhipl) compiler installed
35+
- First, make sure you have your [CUDA](https://kerneltuner.github.io/kernel_tuner/stable/install.html#cuda-and-pycuda), [OpenCL](https://kerneltuner.github.io/kernel_tuner/stable/install.html#opencl-and-pyopencl), or [HIP](https://kerneltuner.github.io/kernel_tuner/stable/install.html#hip-and-hip-python) compiler installed
3636
- Then type: `pip install kernel_tuner[cuda]`, `pip install kernel_tuner[opencl]`, or `pip install kernel_tuner[hip]`
3737
- or why not all of them: `pip install kernel_tuner[cuda,opencl,hip]`
3838

doc/source/backends.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ used to compile the kernels.
5858
:header: Feature, PyCUDA, CuPy, CUDA-Python, HIP
5959
:widths: auto
6060

61-
Python package, "pycuda", "cupy", "cuda-python", "pyhip-interface"
61+
Python package, "pycuda", "cupy", "cuda-python", "hip-python"
6262
Selected with lang=, "CUDA", "CUPY", "NVCUDA", "HIP"
6363
Compiler used, "nvcc", "nvrtc", "nvrtc", "hiprtc"
6464

doc/source/design.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ building blocks for implementing runners.
4949
The observers are explained in :ref:`observers`.
5050

5151
At the bottom, the backends are shown.
52-
PyCUDA, CuPy, cuda-python, PyOpenCL and PyHIP are for tuning either CUDA, OpenCL, or HIP kernels.
52+
PyCUDA, CuPy, cuda-python, PyOpenCL and HIP Python are for tuning either CUDA, OpenCL, or HIP kernels.
5353
The CompilerFunctions implementation can call any compiler, typically NVCC
5454
or GCC is used. There is limited support for tuning Fortran kernels.
5555
This backend was created not just to be able to tune C

examples/cuda/vector_add_observers_pmt.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@ def tune():
3131
tune_params = dict()
3232
tune_params["block_size_x"] = [128+64*i for i in range(15)]
3333

34-
pmtobserver = PMTObserver(["nvml", "rapl"])
34+
pmtobserver = PMTObserver([("nvidia", 0), "rapl"])
3535

3636
metrics = OrderedDict()
37-
metrics["GPU W"] = lambda p: p["nvml_power"]
37+
metrics["GPU W"] = lambda p: p["nvidia_power"]
3838
metrics["CPU W"] = lambda p: p["rapl_power"]
3939

4040
results, env = tune_kernel("vector_add", kernel_string, size, args, tune_params, observers=[pmtobserver], metrics=metrics, iterations=32)
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/usr/bin/env python
2+
"""This is the minimal example from the README"""
3+
4+
import json
5+
6+
import numpy
7+
from kernel_tuner import tune_kernel
8+
from kernel_tuner.observers.tegra import TegraObserver
9+
10+
def tune():
11+
12+
kernel_string = """
13+
__global__ void vector_add(float *c, float *a, float *b, int n) {
14+
int i = blockIdx.x * block_size_x + threadIdx.x;
15+
if (i<n) {
16+
c[i] = a[i] + b[i];
17+
}
18+
}
19+
"""
20+
21+
size = 800000
22+
23+
a = numpy.random.randn(size).astype(numpy.float32)
24+
b = numpy.random.randn(size).astype(numpy.float32)
25+
c = numpy.zeros_like(b)
26+
n = numpy.int32(size)
27+
28+
args = [c, a, b, n]
29+
30+
tune_params = dict()
31+
tune_params["block_size_x"] = [128+64*i for i in range(15)]
32+
33+
tegraobserver = TegraObserver(["core_freq"])
34+
35+
metrics = dict()
36+
metrics["f"] = lambda p: p["core_freq"]
37+
38+
results, env = tune_kernel("vector_add", kernel_string, size, args, tune_params, observers=[tegraobserver], metrics=metrics)
39+
40+
print(results)
41+
42+
return results
43+
44+
45+
if __name__ == "__main__":
46+
tune()
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
#!/usr/bin/env python
2+
"""This is an example tuning a naive matrix multiplication using the simplified directives interface"""
3+
4+
from kernel_tuner import tune_kernel
5+
from kernel_tuner.utils.directives import (
6+
Code,
7+
OpenACC,
8+
Cxx,
9+
process_directives
10+
)
11+
12+
N = 4096
13+
14+
code = """
15+
#define N 4096
16+
17+
void matrix_multiply(float *A, float *B, float *C) {
18+
#pragma tuner start mm A(float*:NN) B(float*:NN) C(float*:NN)
19+
float temp_sum = 0.0f;
20+
#pragma acc parallel vector_length(nthreads)
21+
#pragma acc loop gang collapse(2)
22+
for ( int i = 0; i < N; i++) {
23+
for ( int j = 0; j < N; j++ ) {
24+
temp_sum = 0.0f;
25+
#pragma acc loop vector reduction(+:temp_sum)
26+
for ( int k = 0; k < N; k++ ) {
27+
temp_sum += A[(i * N) + k] * B[(k * N) + j];
28+
}
29+
C[(i * N) + j] = temp_sum;
30+
}
31+
}
32+
#pragma tuner stop
33+
}
34+
"""
35+
36+
# Extract tunable directive
37+
app = Code(OpenACC(), Cxx())
38+
dims = {"NN": N**2}
39+
kernel_string, kernel_args = process_directives(app, code, user_dimensions=dims)
40+
41+
tune_params = dict()
42+
tune_params["nthreads"] = [32 * i for i in range(1, 33)]
43+
metrics = dict()
44+
metrics["time_s"] = lambda x: x["time"] / 10**3
45+
metrics["GB/s"] = lambda x: ((N**3 * 2 * 4) + (N**2 * 4)) / x["time_s"] / 10**9
46+
metrics["GFLOP/s"] = lambda x: (N**3 * 3) / x["time_s"] / 10**9
47+
48+
tune_kernel(
49+
"mm",
50+
kernel_string["mm"],
51+
0,
52+
kernel_args["mm"],
53+
tune_params,
54+
metrics=metrics,
55+
compiler_options=["-fast", "-acc=gpu"],
56+
compiler="nvc++",
57+
)

examples/hip/test_vector_add.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@
55
from kernel_tuner import run_kernel
66
import pytest
77

8-
#Check pyhip is installed and if a HIP capable device is present, if not skip the test
8+
#Check hip is installed and if a HIP capable device is present, if not skip the test
99
try:
10-
from pyhip import hip, hiprtc
10+
from hip import hip, hiprtc
1111
except ImportError:
12-
pytest.skip("PyHIP not installed or PYTHONPATH does not includes PyHIP")
12+
pytest.skip("HIP Python not installed or PYTHONPATH does not includes HIP Python")
1313
hip = None
1414
hiprtc = None
1515

examples/hip/vector_add.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ def tune():
3030
tune_params = OrderedDict()
3131
tune_params["block_size_x"] = [128+64*i for i in range(15)]
3232

33-
results, env = tune_kernel("vector_add", kernel_string, size, args, tune_params, lang="HIP",
34-
cache="vector_add_cache.json", log=logging.DEBUG)
33+
results, env = tune_kernel("vector_add", kernel_string, size, args, tune_params, lang="HIP",
34+
log=logging.DEBUG)
3535

3636
# Store the metadata of this run
3737
store_metadata_file("vector_add-metadata.json")
@@ -40,4 +40,4 @@ def tune():
4040

4141

4242
if __name__ == "__main__":
43-
tune()
43+
tune()

0 commit comments

Comments
 (0)