Skip to content

Commit 40847f6

Browse files
author
Quentin Berthet
committed
Update documentation.
1 parent 78a234b commit 40847f6

File tree

3 files changed

+95
-48
lines changed

3 files changed

+95
-48
lines changed

docs/advanced/vitis_accelerator.rst

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
========================
2+
VitisAccelerator Backend
3+
========================
4+
5+
The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project for `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
6+
The Vitis accelerator backend has been tested with the following boards:
7+
8+
* `Alveo u50 <https://www.xilinx.com/products/boards-and-kits/alveo/u50.html>`_
9+
* `Alveo u55c <https://www.xilinx.com/products/boards-and-kits/alveo/u55c.html>`_
10+
* `Alveo u250 <https://www.xilinx.com/products/boards-and-kits/alveo/u250.html>`_
11+
* `Versal vck5000 <https://www.xilinx.com/products/boards-and-kits/vck5000.html>`_
12+
13+
Kernel wrapper
14+
==============
15+
16+
To integrate with the Vitis System Design Flow and run on an accelerator, the generated ``hls4ml`` model must be encapsulated and built as a Vitis kernel (``.xo`` file) and linked into a binary file (``.xclbin``) during the implementation step. On the host side, standard C++ code using either `OpenCL <https://xilinx.github.io/XRT/master/html/opencl_extension.html>`_ or `XRT API <https://xilinx.github.io/XRT/master/html/xrt_native_apis.html>`_ can be used to download the ``.xclbin`` file to the accelerator card and use any kernel it contains.
17+
18+
The ``VitisAccelerator`` backend generates automatically generate a kernel wrapper, an host code example, and a Makefile to build the project.
19+
20+
Options
21+
=======
22+
23+
As PCIe accelerators are not suitable for ultra-low latency applications, it is assumed that they are used for high-throughput applications. To accommodate this, the backend supports the following options to optimize the kernel for throughput:
24+
25+
* ``num_kernel``: Number of kernel instances to implement in the hardware architecture.
26+
* ``num_thread``: Number of host threads used to exercise the kernels in the host application.
27+
* ``batchsize``: Number of samples to be processed in a single kernel execution.
28+
29+
Additionnaly, the backend propose the following options to customize the implementation:
30+
31+
* ``board``: The target board, must match one entry in ``supported_boards.json``.
32+
* ``clock_period``: The target clock period in ns.
33+
* ``hw_quant``: Is arbitrary precision quantization performed in hardware or not. If True, the quantization is performed in hardware and float are used at the kernel interface, otherwise it is performed in software and arbitrary precision types are used at the interface. (Defaults to ``False``).
34+
* ``vivado_directives``: A list of strings to be added under the ``[Vivado]`` section of the generated ``accelerator_card.cfg`` link configuration file. Can be used to add custom directives to the Vivado project.
35+
36+
Build workflow
37+
==============
38+
39+
At the call of the ``build`` method, the following option affect the build process:
40+
41+
* ``reset``: TBD.
42+
* ``csim``: TBD.
43+
* ``synth``: TBD.
44+
* ``cosim``: TBD.
45+
* ``vsynth``: TBD.
46+
* ``debug``: TBD.
47+
48+
Once the project is generated, it possible to run manually the build steps by using one of the following ``make`` targets in the generated project directory:
49+
50+
* ``host``: Compiles the host application.
51+
* ``hls``: Produces only the kernel's object file.
52+
* ``xclbin``: Produces only the kernel's .xclbin file.
53+
54+
It is also possible to run the full build process by calling ``make`` without any target. Modifications to the ``accelerator_card.cfg`` file can be done manually before running the build process (e.g., to change the clock period, or add addition ``.xo`` kernel to the build).
55+
56+
The generated host code application and the xclbin file can be executed as such:
57+
58+
.. code-block:: Bash
59+
60+
./host <myproject>.xclbin
61+
62+
Example
63+
=======
64+
65+
The following example is a modified version of `hsl4ml example 7 <https://github.com/fastmachinelearning/hls4ml-tutorial/blob/master/part7_deployment.ipynb>`_.
66+
67+
.. code-block:: Python
68+
69+
import hls4ml
70+
hls_model = hls4ml.converters.convert_from_keras_model(
71+
model,
72+
hls_config=config,
73+
output_dir='model_3/hls4ml_prj_vitis_accel',
74+
backend='VitisAccelerator',
75+
board='alveo-u55c',
76+
num_kernel=4,
77+
num_thread=8,
78+
batchsize=8192
79+
)
80+
hls_model.compile()
81+
hls_model.build()
82+
83+
By default the build method generates all the necessary files to run the kernel on the accelerator board. As this can be a long process, there are three build options that target the generation of specific parts of the project:
84+
85+
* `host`: Compiles the host application
86+
* `hls`: Produces only the kernel's object file
87+
* `xclbin`: Produces only the kernel's .xclbin file
88+
89+
The generated host code application and the xclbin file can be executed as such:
90+
91+
.. code-block:: Bash
92+
93+
./host <myproject>.xclbin

docs/advanced/accelerator.rst renamed to docs/advanced/vivado_accelerator.rst

Lines changed: 0 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -75,50 +75,3 @@ The ``predict`` method will send the input data to the PL and return the output
7575
7676
nn = NeuralNetworkOverlay('hls4ml_nn.bit', X_test.shape, y_test.shape)
7777
y_hw, latency, throughput = nn.predict(X_test, profile=True)
78-
79-
========================
80-
VitisAccelerator Backend
81-
========================
82-
83-
The ``VitsAccelerator`` backned makes use of the vitis kernel flow to and streamlines the generation of an hls4ml project targeting PCIe accelerators.
84-
Vitis accelerator backend supports the following boards:
85-
86-
* `Alveo u50 <https://www.xilinx.com/products/boards-and-kits/alveo/u50.html>`_
87-
* `Alveo u55c <https://www.xilinx.com/products/boards-and-kits/alveo/u55c.html>`_
88-
* `Alveo u250 <https://www.xilinx.com/products/boards-and-kits/alveo/u250.html>`_
89-
* `Versal vck5000 <https://www.xilinx.com/products/boards-and-kits/vck5000.html>`_
90-
91-
The backend also generates an `OpenCL` host code that uploads and runs the kernel on the accelerator card.
92-
93-
Example
94-
=======
95-
96-
The following example is a modified version of `hsl4ml example 7 <https://github.com/fastmachinelearning/hls4ml-tutorial/blob/master/part7_deployment.ipynb>`_.
97-
98-
.. code-block:: Python
99-
100-
import hls4ml
101-
hls_model = hls4ml.converters.convert_from_keras_model(
102-
model,
103-
hls_config=config,
104-
output_dir='model_3/hls4ml_prj_vitis_accel',
105-
backend='VitisAccelerator',
106-
board='alveo-u55c',
107-
num_kernel=4,
108-
num_thread=8,
109-
batchsize=8192
110-
)
111-
hls_model.compile()
112-
hls_model.build()
113-
114-
By default the build method generates all the necessary files to run the kernel on the accelerator board. As this can be a long process, there are three build options that target the generation of specific parts of the project:
115-
116-
* `host`: Compiles the host application
117-
* `hls`: Produces only the kernel's object file
118-
* `xclbin`: Produces only the kernel's .xclbin file
119-
120-
The generated host code application and the xclbin file can be executed as such:
121-
122-
.. code-block:: Bash
123-
124-
./host <myproject>.xclbin

docs/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@
2424

2525
advanced/fifo_depth
2626
advanced/extension
27-
advanced/accelerator
27+
advanced/vivado_accelerator
28+
advanced/vitis_accelerator
2829
advanced/model_optimization
2930

3031
.. toctree::

0 commit comments

Comments
 (0)