|
| 1 | +======================== |
| 2 | +VitisAccelerator Backend |
| 3 | +======================== |
| 4 | + |
| 5 | +The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project for `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_. |
| 6 | +The Vitis accelerator backend has been tested with the following boards: |
| 7 | + |
| 8 | +* `Alveo u50 <https://www.xilinx.com/products/boards-and-kits/alveo/u50.html>`_ |
| 9 | +* `Alveo u55c <https://www.xilinx.com/products/boards-and-kits/alveo/u55c.html>`_ |
| 10 | +* `Alveo u250 <https://www.xilinx.com/products/boards-and-kits/alveo/u250.html>`_ |
| 11 | +* `Versal vck5000 <https://www.xilinx.com/products/boards-and-kits/vck5000.html>`_ |
| 12 | + |
| 13 | +Kernel wrapper |
| 14 | +============== |
| 15 | + |
| 16 | +To integrate with the Vitis System Design Flow and run on an accelerator, the generated ``hls4ml`` model must be encapsulated and built as a Vitis kernel (``.xo`` file) and linked into a binary file (``.xclbin``) during the implementation step. On the host side, standard C++ code using either `OpenCL <https://xilinx.github.io/XRT/master/html/opencl_extension.html>`_ or `XRT API <https://xilinx.github.io/XRT/master/html/xrt_native_apis.html>`_ can be used to download the ``.xclbin`` file to the accelerator card and use any kernel it contains. |
| 17 | + |
| 18 | +The ``VitisAccelerator`` backend generates automatically generate a kernel wrapper, an host code example, and a Makefile to build the project. |
| 19 | + |
| 20 | +Options |
| 21 | +======= |
| 22 | + |
| 23 | +As PCIe accelerators are not suitable for ultra-low latency applications, it is assumed that they are used for high-throughput applications. To accommodate this, the backend supports the following options to optimize the kernel for throughput: |
| 24 | + |
| 25 | + * ``num_kernel``: Number of kernel instances to implement in the hardware architecture. |
| 26 | + * ``num_thread``: Number of host threads used to exercise the kernels in the host application. |
| 27 | + * ``batchsize``: Number of samples to be processed in a single kernel execution. |
| 28 | + |
| 29 | +Additionnaly, the backend propose the following options to customize the implementation: |
| 30 | + |
| 31 | + * ``board``: The target board, must match one entry in ``supported_boards.json``. |
| 32 | + * ``clock_period``: The target clock period in ns. |
| 33 | + * ``hw_quant``: Is arbitrary precision quantization performed in hardware or not. If True, the quantization is performed in hardware and float are used at the kernel interface, otherwise it is performed in software and arbitrary precision types are used at the interface. (Defaults to ``False``). |
| 34 | + * ``vivado_directives``: A list of strings to be added under the ``[Vivado]`` section of the generated ``accelerator_card.cfg`` link configuration file. Can be used to add custom directives to the Vivado project. |
| 35 | + |
| 36 | +Build workflow |
| 37 | +============== |
| 38 | + |
| 39 | +At the call of the ``build`` method, the following option affect the build process: |
| 40 | + |
| 41 | + * ``reset``: TBD. |
| 42 | + * ``csim``: TBD. |
| 43 | + * ``synth``: TBD. |
| 44 | + * ``cosim``: TBD. |
| 45 | + * ``vsynth``: TBD. |
| 46 | + * ``debug``: TBD. |
| 47 | + |
| 48 | +Once the project is generated, it possible to run manually the build steps by using one of the following ``make`` targets in the generated project directory: |
| 49 | + |
| 50 | + * ``host``: Compiles the host application. |
| 51 | + * ``hls``: Produces only the kernel's object file. |
| 52 | + * ``xclbin``: Produces only the kernel's .xclbin file. |
| 53 | + |
| 54 | +It is also possible to run the full build process by calling ``make`` without any target. Modifications to the ``accelerator_card.cfg`` file can be done manually before running the build process (e.g., to change the clock period, or add addition ``.xo`` kernel to the build). |
| 55 | + |
| 56 | +The generated host code application and the xclbin file can be executed as such: |
| 57 | + |
| 58 | +.. code-block:: Bash |
| 59 | +
|
| 60 | + ./host <myproject>.xclbin |
| 61 | +
|
| 62 | +Example |
| 63 | +======= |
| 64 | + |
| 65 | +The following example is a modified version of `hsl4ml example 7 <https://github.com/fastmachinelearning/hls4ml-tutorial/blob/master/part7_deployment.ipynb>`_. |
| 66 | + |
| 67 | +.. code-block:: Python |
| 68 | +
|
| 69 | + import hls4ml |
| 70 | + hls_model = hls4ml.converters.convert_from_keras_model( |
| 71 | + model, |
| 72 | + hls_config=config, |
| 73 | + output_dir='model_3/hls4ml_prj_vitis_accel', |
| 74 | + backend='VitisAccelerator', |
| 75 | + board='alveo-u55c', |
| 76 | + num_kernel=4, |
| 77 | + num_thread=8, |
| 78 | + batchsize=8192 |
| 79 | + ) |
| 80 | + hls_model.compile() |
| 81 | + hls_model.build() |
| 82 | +
|
| 83 | +By default the build method generates all the necessary files to run the kernel on the accelerator board. As this can be a long process, there are three build options that target the generation of specific parts of the project: |
| 84 | + |
| 85 | +* `host`: Compiles the host application |
| 86 | +* `hls`: Produces only the kernel's object file |
| 87 | +* `xclbin`: Produces only the kernel's .xclbin file |
| 88 | + |
| 89 | +The generated host code application and the xclbin file can be executed as such: |
| 90 | + |
| 91 | +.. code-block:: Bash |
| 92 | +
|
| 93 | + ./host <myproject>.xclbin |
0 commit comments