KernelTuner
diff --git a/‎CHANGELOG.md
Lines changed: 1 addition & 0 deletions b/‎CHANGELOG.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎INSTALL.rst
Lines changed: 11 additions & 16 deletions b/‎INSTALL.rst
Lines changed: 11 additions & 16 deletions
diff --git a/‎README.md
Lines changed: 1 addition & 1 deletion b/‎README.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/source/backends.rst
Lines changed: 1 addition & 1 deletion b/‎doc/source/backends.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/source/design.rst
Lines changed: 1 addition & 1 deletion b/‎doc/source/design.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/cuda/vector_add_observers_pmt.py
Lines changed: 2 additions & 2 deletions b/‎examples/cuda/vector_add_observers_pmt.py
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/cuda/vector_add_tegra_observer.py
Lines changed: 46 additions & 0 deletions b/‎examples/cuda/vector_add_tegra_observer.py
Lines changed: 46 additions & 0 deletions
diff --git a/‎examples/directives/matrix_multiply_c_openacc.py
Lines changed: 57 additions & 0 deletions b/‎examples/directives/matrix_multiply_c_openacc.py
Lines changed: 57 additions & 0 deletions
diff --git a/‎examples/hip/test_vector_add.py
Lines changed: 3 additions & 3 deletions b/‎examples/hip/test_vector_add.py
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/hip/vector_add.py
Lines changed: 3 additions & 3 deletions b/‎examples/hip/vector_add.py
Lines changed: 3 additions & 3 deletions
@@ -3,6 +3,7 @@ All notable changes to this project will be documented in this file.
 This project adheres to [Semantic Versioning](http://semver.org/).
 
 ## Unreleased
+- changed HIP python bindings from pyhip-interface to the official hip-python
 
 ## [1.0.0] - 2024-04-04
 - HIP backend to support tuning HIP kernels on AMD GPUs
 
@@ -124,31 +124,26 @@ Or you could install Kernel Tuner and PyOpenCL together if you haven't done so a
 
 If this fails, please see the PyOpenCL installation guide (https://wiki.tiker.net/PyOpenCL/Installation)
 
-HIP and PyHIP
+HIP and HIP Python
 -------------
 
-Before we can install PyHIP, you'll need to have the HIP runtime and compiler installed on your system.
+Before we can install HIP Python, you'll need to have the HIP runtime and compiler installed on your system.
 The HIP compiler is included as part of the ROCm software stack. Here is AMD's installation guide:
 
 * `ROCm Documentation: HIP Installation Guide <https://docs.amd.com/bundle/HIP-Installation-Guide-v5.3/page/Introduction_to_HIP_Installation_Guide.html>`__
 
-After you've installed HIP, you will need to install PyHIP. Run the following command in your terminal to install:
+After you've installed HIP, you will need to install HIP Python. Run the following command in your terminal to install:
 
-.. code-block:: bash
-
-    pip install pyhip-interface
+First identify the first three digits of the version number of your ROCm™ installation.
+Then install the HIP Python package(s) as follows:
 
-Alternatively, you can install PyHIP from the source code. First, clone the repository from GitHub:
-
-.. code-block:: bash
+.. code-block:: shell
 
-    git clone https://github.com/jatinx/PyHIP
-
-Then, navigate to the repository directory and run the following command to install:
-
-.. code-block:: bash
+    python3 -m pip install -i https://test.pypi.org/simple hip-python~=$rocm_version
+    # if you want to install the CUDA Python interoperability package too, run:
+    python3 -m pip install -i https://test.pypi.org/simple hip-python-as-cuda~=$rocm_version
 
-    python setup.py install
+For other installation options check `hip-python on GitHub <https://github.com/ROCm/hip-python>`_
 
 Installing the git version
 --------------------------
@@ -171,7 +166,7 @@ The runtime dependencies are:
 
 - `cuda`: install pycuda along with kernel_tuner
 - `opencl`: install pycuda along with kernel_tuner
-- `hip`: install pyhip along with kernel_tuner
+- `hip`: install HIP Python along with kernel_tuner
 - `tutorial`: install packages required to run the guides
 
 These can be installed by appending e.g. ``-E cuda -E opencl -E hip``.
 
@@ -32,7 +32,7 @@ What Kernel Tuner does:
 
 ## Installation
 
-- First, make sure you have your [CUDA](https://kerneltuner.github.io/kernel_tuner/stable/install.html#cuda-and-pycuda), [OpenCL](https://kerneltuner.github.io/kernel_tuner/stable/install.html#opencl-and-pyopencl), or [HIP](https://kerneltuner.github.io/kernel_tuner/stable/install.html#hip-and-pyhipl) compiler installed
+- First, make sure you have your [CUDA](https://kerneltuner.github.io/kernel_tuner/stable/install.html#cuda-and-pycuda), [OpenCL](https://kerneltuner.github.io/kernel_tuner/stable/install.html#opencl-and-pyopencl), or [HIP](https://kerneltuner.github.io/kernel_tuner/stable/install.html#hip-and-hip-python) compiler installed
 - Then type: `pip install kernel_tuner[cuda]`, `pip install kernel_tuner[opencl]`, or `pip install kernel_tuner[hip]`
 - or why not all of them: `pip install kernel_tuner[cuda,opencl,hip]`
 
 
@@ -58,7 +58,7 @@ used to compile the kernels.
   :header: Feature, PyCUDA, CuPy, CUDA-Python, HIP
   :widths: auto
 
-  Python package,      "pycuda", "cupy", "cuda-python", "pyhip-interface"
+  Python package,      "pycuda", "cupy", "cuda-python", "hip-python"
   Selected with lang=, "CUDA", "CUPY", "NVCUDA", "HIP"
   Compiler used,       "nvcc", "nvrtc", "nvrtc", "hiprtc"
 
 
@@ -49,7 +49,7 @@ building blocks for implementing runners.
 The observers are explained in :ref:`observers`.
 
 At the bottom, the backends are shown.
-PyCUDA, CuPy, cuda-python, PyOpenCL and PyHIP are for tuning either CUDA, OpenCL, or HIP kernels.
+PyCUDA, CuPy, cuda-python, PyOpenCL and HIP Python are for tuning either CUDA, OpenCL, or HIP kernels.
 The CompilerFunctions implementation can call any compiler, typically NVCC
 or GCC is used. There is limited support for tuning Fortran kernels.
 This backend was created not just to be able to tune C
 
@@ -31,10 +31,10 @@ def tune():
     tune_params = dict()
     tune_params["block_size_x"] = [128+64*i for i in range(15)]
 
-    pmtobserver = PMTObserver(["nvml", "rapl"])
+    pmtobserver = PMTObserver([("nvidia", 0), "rapl"])
 
     metrics = OrderedDict()
-    metrics["GPU W"] = lambda p: p["nvml_power"]
+    metrics["GPU W"] = lambda p: p["nvidia_power"]
     metrics["CPU W"] = lambda p: p["rapl_power"]
 
     results, env = tune_kernel("vector_add", kernel_string, size, args, tune_params, observers=[pmtobserver], metrics=metrics, iterations=32)
 
@@ -0,0 +1,46 @@
+#!/usr/bin/env python
+"""This is the minimal example from the README"""
+
+import json
+
+import numpy
+from kernel_tuner import tune_kernel
+from kernel_tuner.observers.tegra import TegraObserver
+
+def tune():
+
+    kernel_string = """
+    __global__ void vector_add(float *c, float *a, float *b, int n) {
+        int i = blockIdx.x * block_size_x + threadIdx.x;
+        if (i<n) {
+            c[i] = a[i] + b[i];
+        }
+    }
+    """
+
+    size = 800000
+
+    a = numpy.random.randn(size).astype(numpy.float32)
+    b = numpy.random.randn(size).astype(numpy.float32)
+    c = numpy.zeros_like(b)
+    n = numpy.int32(size)
+
+    args = [c, a, b, n]
+
+    tune_params = dict()
+    tune_params["block_size_x"] = [128+64*i for i in range(15)]
+
+    tegraobserver = TegraObserver(["core_freq"])
+
+    metrics = dict()
+    metrics["f"] = lambda p: p["core_freq"]
+
+    results, env = tune_kernel("vector_add", kernel_string, size, args, tune_params, observers=[tegraobserver], metrics=metrics)
+
+    print(results)
+
+    return results
+
+
+if __name__ == "__main__":
+    tune()
@@ -0,0 +1,57 @@
+#!/usr/bin/env python
+"""This is an example tuning a naive matrix multiplication using the simplified directives interface"""
+
+from kernel_tuner import tune_kernel
+from kernel_tuner.utils.directives import (
+    Code,
+    OpenACC,
+    Cxx,
+    process_directives
+)
+
+N = 4096
+
+code = """
+#define N 4096
+
+void matrix_multiply(float *A, float *B, float *C) {
+    #pragma tuner start mm A(float*:NN) B(float*:NN) C(float*:NN)
+    float temp_sum = 0.0f;
+    #pragma acc parallel vector_length(nthreads)
+    #pragma acc loop gang collapse(2)
+    for ( int i = 0; i < N; i++) {
+        for ( int j = 0; j < N; j++ ) {
+            temp_sum = 0.0f;
+            #pragma acc loop vector reduction(+:temp_sum)
+            for ( int k = 0; k < N; k++ ) {
+                temp_sum += A[(i * N) + k] * B[(k * N) + j];
+            }
+            C[(i * N) + j] = temp_sum;
+        }
+    }
+    #pragma tuner stop
+}
+"""
+
+# Extract tunable directive
+app = Code(OpenACC(), Cxx())
+dims = {"NN": N**2}
+kernel_string, kernel_args = process_directives(app, code, user_dimensions=dims)
+
+tune_params = dict()
+tune_params["nthreads"] = [32 * i for i in range(1, 33)]
+metrics = dict()
+metrics["time_s"] = lambda x: x["time"] / 10**3
+metrics["GB/s"] = lambda x: ((N**3 * 2 * 4) + (N**2 * 4)) / x["time_s"] / 10**9
+metrics["GFLOP/s"] = lambda x: (N**3 * 3) / x["time_s"] / 10**9
+
+tune_kernel(
+    "mm",
+    kernel_string["mm"],
+    0,
+    kernel_args["mm"],
+    tune_params,
+    metrics=metrics,
+    compiler_options=["-fast", "-acc=gpu"],
+    compiler="nvc++",
+)
@@ -5,11 +5,11 @@
 from kernel_tuner import run_kernel
 import pytest
 
-#Check pyhip is installed and if a HIP capable device is present, if not skip the test
+#Check hip is installed and if a HIP capable device is present, if not skip the test
 try:
-    from pyhip import hip, hiprtc
+    from hip import hip, hiprtc
 except ImportError:
-    pytest.skip("PyHIP not installed or PYTHONPATH does not includes PyHIP")
+    pytest.skip("HIP Python not installed or PYTHONPATH does not includes HIP Python")
     hip = None
     hiprtc = None
 
 
@@ -30,8 +30,8 @@ def tune():
     tune_params = OrderedDict()
     tune_params["block_size_x"] = [128+64*i for i in range(15)]
 
-    results, env = tune_kernel("vector_add", kernel_string, size, args, tune_params, lang="HIP", 
-                               cache="vector_add_cache.json", log=logging.DEBUG)
+    results, env = tune_kernel("vector_add", kernel_string, size, args, tune_params, lang="HIP",
+                               log=logging.DEBUG)
 
     # Store the metadata of this run
     store_metadata_file("vector_add-metadata.json")
@@ -40,4 +40,4 @@ def tune():
 
 
 if __name__ == "__main__":
-    tune()
+    tune()