Skip to content

Commit a8cad07

Browse files
authored
Merge branch 'main' into update_qonnx_test
2 parents 068ae24 + c2a75fd commit a8cad07

File tree

119 files changed

+3216
-935
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

119 files changed

+3216
-935
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
A# Description
1+
# Description
22

33
> :memo: Please include a summary of the change.
44
>

.pre-commit-config.yaml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,26 @@ exclude: (^hls4ml\/templates\/(vivado|quartus)\/(ap_types|ac_types)\/|^test/pyte
22

33
repos:
44
- repo: https://github.com/psf/black
5-
rev: 24.10.0
5+
rev: 25.1.0
66
hooks:
77
- id: black
88
language_version: python3
99
args: ['--line-length=125',
1010
'--skip-string-normalization']
1111

12+
- repo: https://github.com/tox-dev/pyproject-fmt
13+
rev: v2.5.1
14+
hooks:
15+
- id: pyproject-fmt
16+
1217
- repo: https://github.com/pre-commit/pre-commit-hooks
1318
rev: v5.0.0
1419
hooks:
1520
- id: check-added-large-files
1621
- id: check-case-conflict
1722
- id: check-merge-conflict
1823
- id: check-symlinks
24+
- id: check-toml
1925
- id: check-yaml
2026
- id: debug-statements
2127
- id: end-of-file-fixer
@@ -24,24 +30,18 @@ repos:
2430
- id: trailing-whitespace
2531

2632
- repo: https://github.com/PyCQA/isort
27-
rev: 5.13.2
33+
rev: 6.0.1
2834
hooks:
2935
- id: isort
30-
args: ["--profile", "black", --line-length=125]
3136

3237
- repo: https://github.com/asottile/pyupgrade
33-
rev: v3.19.0
38+
rev: v3.19.1
3439
hooks:
3540
- id: pyupgrade
3641
args: ["--py36-plus"]
3742

38-
- repo: https://github.com/asottile/setup-cfg-fmt
39-
rev: v2.7.0
40-
hooks:
41-
- id: setup-cfg-fmt
42-
4343
- repo: https://github.com/pycqa/flake8
44-
rev: 7.1.1
44+
rev: 7.1.2
4545
hooks:
4646
- id: flake8
4747
exclude: docs/conf.py

MANIFEST.in

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1-
include LICENSE README.md CONTRIBUTING.md CITATION.cff pyproject.toml setup.py setup.cfg .clang-format
1+
include LICENSE README.md CONTRIBUTING.md CITATION.cff pyproject.toml .clang-format
22
graft example-models
33
graft test
44
graft contrib
55
recursive-include hls4ml/templates *
6-
global-exclude .git .gitmodules .gitlab-ci.yml
6+
recursive-include hls4ml *.py
7+
recursive-include hls4ml/contrib *
8+
global-exclude .git .gitmodules .gitlab-ci.yml *.pyc
79
include hls4ml/backends/vivado_accelerator/supported_boards.json

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ hls4ml.report.read_vivado_report('my-hls-test')
6565

6666
# FAQ
6767

68-
List of frequently asked questions and common HLS synthesis can be found [here](https://fastmachinelearning.org/hls4ml/faq.html)
68+
List of frequently asked questions and common HLS synthesis can be found [here](https://fastmachinelearning.org/hls4ml/intro/faq.html)
6969

7070
# Citation
7171
If you use this software in a publication, please cite the software

docs/advanced/fifo_depth.rst

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,28 +5,29 @@ FIFO Buffer Depth Optimization
55
With the ``io_stream`` IO type, each layer is connected with the subsequent layer through first-in first-out (FIFO) buffers.
66
The implementation of the FIFO buffers contribute to the overall resource utilization of the design, impacting in particular the BRAM or LUT utilization.
77
Because the neural networks can have complex architectures generally, it is hard to know a priori the correct depth of each FIFO buffer.
8-
By default ``hls4ml`` choses the most conservative possible depth for each FIFO buffer, which can result in a an unnecessary overutilization of resources.
8+
By default ``hls4ml`` choses the most conservative possible depth for each FIFO buffer, which can result in a an unnecessary over-utilization of resources.
99

10-
In order to reduce the impact on the resources used for FIFO buffer implementation, an optimization has been developed in `#509 <https://github.com/fastmachinelearning/hls4ml/pull/509>`_ that correctly sizes the depth of the FIFO buffers by analyzing the RTL cosimulation.
11-
We implemented this FIFO buffer resizing as a :py:class:`~hls4ml.backends.vivado.passes.fifo_depth_optimization` optimizer pass.
10+
In order to reduce the impact on the resources used for FIFO buffer implementation, an optimization flow has been developed that correctly sizes the depth
11+
of the FIFO buffers by analyzing the RTL co-simulation. This feature is currently available in ``Vitis`` and ``Vivado`` backends.
12+
13+
In ``Vivado`` backend, FIFO buffer resizing is implemented as a :py:class:`~hls4ml.backends.vivado.passes.fifo_depth_optimization` optimizer pass.
1214
Through RTL simulation with large FIFO buffers (by default set to a depth of 100,000), we estimate the maximum occupation of each FIFO.
1315
Once the maximum depth is determined, the optimizer pass sets the FIFO buffer depth to that value plus 1.
1416

15-
As an example, we show below how to use the optimizer pass, inspired by this `GitHub Gist <https://gist.github.com/nicologhielmetti/3a268be32755448920e9f7d5c78a76d8>`_.
16-
First, we can define a simple neural network in Keras
17+
Below we show an example of the use of the FIFO depth optimization. First, we can define a simple neural network in Keras:
1718

1819
.. code-block:: Python
1920
2021
from tensorflow.keras.layers import Dense
2122
from tensorflow.keras.models import Sequential
2223
2324
model = Sequential()
24-
model.add(Dense(64, input_shape=(16,), name='fc1', activation='relu')
25+
model.add(Dense(64, input_shape=(16,), name='fc1', activation='relu'))
2526
model.add(Dense(32, name='fc2', activation='relu'))
2627
model.add(Dense(32, name='fc3', activation='relu'))
27-
model.add(Dense(5, name='fc3', activation='softmax'))
28+
model.add(Dense(5, name='fc4', activation='softmax'))
2829
29-
Then, we can convert the model, including the flow
30+
Then, we can convert the model, including the flow:
3031

3132
.. code-block:: Python
3233
@@ -47,3 +48,17 @@ Then, we can convert the model, including the flow
4748
hls_model.build(reset=False, csim=True, synth=True, cosim=True)
4849
4950
For more details and results, see `H. Borras et al., "Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark" (2022) <https://arxiv.org/abs/2206.11791>`_.
51+
52+
Similarly, the FIFO buffers can be optimized while using the ``Vitis`` backend with the following changes:
53+
54+
.. code-block:: Python
55+
56+
config['Flows'] = ['vitis:fifo_depth_optimization']
57+
hls4ml.model.optimizer.get_optimizer('vitis:fifo_depth_optimization').configure(profiling_fifo_depth=100_000)
58+
59+
hls_model = hls4ml.converters.convert_from_keras_model(model,
60+
io_type='io_stream',
61+
hls_config=config,
62+
output_dir='hls4mlprj_fifo_depth_opt',
63+
part='xc7z020clg400-1',
64+
backend='Vitis')

example-models

hls4ml/__init__.py

Lines changed: 0 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,3 @@
1-
# Temporary workaround for QKeras installation requirement, will be removed after 1.0.0
2-
def maybe_install_qkeras():
3-
import subprocess
4-
import sys
5-
6-
QKERAS_PKG_NAME = 'QKeras'
7-
# QKERAS_PKG_SOURCE = QKERAS_PKG_NAME
8-
QKERAS_PKG_SOURCE = 'qkeras@git+https://github.com/fastmachinelearning/qkeras.git'
9-
10-
def pip_list():
11-
p = subprocess.run([sys.executable, '-m', 'pip', 'list'], check=True, capture_output=True)
12-
return p.stdout.decode()
13-
14-
def pip_install(package):
15-
subprocess.check_call([sys.executable, '-m', 'pip', 'install', package])
16-
17-
all_pkgs = pip_list()
18-
if QKERAS_PKG_NAME not in all_pkgs:
19-
print('QKeras installation not found, installing one...')
20-
pip_install(QKERAS_PKG_SOURCE)
21-
print('QKeras installed.')
22-
23-
24-
try:
25-
maybe_install_qkeras()
26-
except Exception:
27-
print('Could not find QKeras installation, make sure you have QKeras installed.')
28-
29-
# End of workaround
30-
311
from hls4ml import converters, report, utils # noqa: F401, E402
322

333
try:

hls4ml/backends/catapult/passes/conv_stream.py

Lines changed: 26 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,12 @@ class GenerateConvStreamingInstructions(OptimizerPass):
66
'''Generates the instructions for streaming implementation of CNNs'''
77

88
def match(self, node):
9-
return isinstance(node, (Conv1D, SeparableConv1D, Conv2D, SeparableConv2D))
9+
is_match = (
10+
isinstance(node, (Conv1D, SeparableConv1D, Conv2D, SeparableConv2D))
11+
and node.model.config.get_config_value('IOType').lower() == 'io_stream'
12+
and node.get_attr('implementation').lower() == 'encoded'
13+
)
14+
return is_match
1015

1116
def transform(self, model, node):
1217
node_class = node.__class__.__name__
@@ -18,35 +23,25 @@ def transform(self, model, node):
1823
raise Exception(f'Cannot generate instructions for node {node.name} ({node_class})')
1924

2025
def _generate_1d_instructions(self, node):
21-
if node.model.config.get_config_value('IOType') == 'io_stream':
22-
min_w, instructions = node.model.config.backend.compute_conv1d_instructions(
23-
node.get_input_variable().shape[0],
24-
node.get_input_variable().shape[1],
25-
node.get_attr('filt_width'),
26-
node.get_attr('stride_width'),
27-
)
28-
instructions_str = ','.join(str(i) for i in instructions)
29-
node.set_attr('min_width', min_w)
30-
node.set_attr('instructions', instructions_str)
31-
else:
32-
# these are unused; just put dummy values
33-
node.set_attr('min_width', node.get_attr('in_width'))
34-
node.set_attr('instructions', '0')
26+
min_w, instructions = node.model.config.backend.compute_conv1d_instructions(
27+
node.get_input_variable().shape[0],
28+
node.get_input_variable().shape[1],
29+
node.get_attr('filt_width'),
30+
node.get_attr('stride_width'),
31+
)
32+
instructions_str = ','.join(str(i) for i in instructions)
33+
node.set_attr('min_width', min_w)
34+
node.set_attr('instructions', instructions_str)
3535

3636
def _generate_2d_instructions(self, node):
37-
if node.model.config.get_config_value('IOType') == 'io_stream':
38-
min_h, min_w, instructions = node.model.config.backend.compute_conv2d_instructions(
39-
node.get_input_variable().shape[0],
40-
node.get_input_variable().shape[1],
41-
node.get_input_variable().shape[2],
42-
node.get_attr('filt_height'),
43-
node.get_attr('stride_height'),
44-
)
45-
instructions_str = ','.join(str(i) for i in instructions)
46-
node.set_attr('min_height', min_h)
47-
node.set_attr('min_width', min_w)
48-
node.set_attr('instructions', instructions_str)
49-
else:
50-
node.set_attr('min_height', node.get_attr('in_height'))
51-
node.set_attr('min_width', node.get_attr('in_width'))
52-
node.set_attr('instructions', '0')
37+
min_h, min_w, instructions = node.model.config.backend.compute_conv2d_instructions(
38+
node.get_input_variable().shape[0],
39+
node.get_input_variable().shape[1],
40+
node.get_input_variable().shape[2],
41+
node.get_attr('filt_height'),
42+
node.get_attr('stride_height'),
43+
)
44+
instructions_str = ','.join(str(i) for i in instructions)
45+
node.set_attr('min_height', min_h)
46+
node.set_attr('min_width', min_w)
47+
node.set_attr('instructions', instructions_str)

hls4ml/backends/catapult/passes/convolution_templates.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,9 @@ def format(self, node):
9494
else:
9595
params['fill_fn'] = 'FillConv1DBuffer'
9696

97+
params['min_width'] = node.get_attr('min_width', node.get_attr('in_width'))
98+
params['instructions'] = node.get_attr('instructions', '0')
99+
97100
conv_config = self.template.format(**params)
98101

99102
mult_params = self._default_config_params(node)
@@ -210,6 +213,10 @@ def format(self, node):
210213
else:
211214
params['fill_fn'] = 'FillConv2DBuffer'
212215

216+
params['min_height'] = node.get_attr('min_height', node.get_attr('in_height'))
217+
params['min_width'] = node.get_attr('min_width', node.get_attr('in_width'))
218+
params['instructions'] = node.get_attr('instructions', '0')
219+
213220
conv_config = self.template.format(**params)
214221

215222
mult_params = self._default_config_params(node)

hls4ml/backends/fpga/fpga_backend.py

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ def __init__(self, name):
9494
attrs.append(ConfigurableAttribute('reuse_factor', default=1, description=descriptions.reuse_factor))
9595
self.attribute_map[layer] = attrs
9696

97-
# seperable is kind of special because it is effectively two layers that will be split
97+
# separable is kind of special because it is effectively two layers that will be split
9898
for layer in (SeparableConv1D, SeparableConv2D):
9999
attrs = self.attribute_map.get(layer, [])
100100
attrs.append(TypeAttribute('depthwise_accum'))
@@ -755,7 +755,7 @@ def generate_conv1d_line_buffer_fn(self, layer_idx, n_partitions, in_W, in_C, ke
755755

756756
generated_code = (
757757
"template<class data_T, typename CONFIG_T>\n"
758-
"class fill_buffer_{index} : public FillConv1DBuffer<data_T, CONFIG_T> {{\n"
758+
"class fill_buffer_{index} : public nnet::FillConv1DBuffer<data_T, CONFIG_T> {{\n"
759759
" public:\n"
760760
" static void fill_buffer(\n"
761761
" data_T data[CONFIG_T::in_width * CONFIG_T::n_chan],\n"
@@ -885,7 +885,7 @@ def generate_conv2d_line_buffer_fn(
885885

886886
generated_code = (
887887
"template<class data_T, typename CONFIG_T>\n"
888-
"class fill_buffer_{index} : public FillConv2DBuffer<data_T, CONFIG_T> {{\n"
888+
"class fill_buffer_{index} : public nnet::FillConv2DBuffer<data_T, CONFIG_T> {{\n"
889889
" public:\n"
890890
" static void fill_buffer(\n"
891891
" data_T data[CONFIG_T::in_height * CONFIG_T::in_width * CONFIG_T::n_chan],\n"
@@ -913,6 +913,33 @@ def generate_conv2d_line_buffer_fn(
913913

914914
return generated_code
915915

916+
@staticmethod
917+
def permute_config_gen(name: str, shape: tuple[int, ...], perm: tuple[int, ...]):
918+
"""
919+
Generate new shape and perm_strides for a permute operation. Operates by mapping the output index
920+
to input input index by:
921+
- unravel the output index
922+
- map each dimension to the corresponding stride in the input tensor, sum
923+
The operation can be expressed as:
924+
925+
new_shape = tuple(shape[i] for i in perm)
926+
strides = np.cumprod((shapes[1:] + (1,))[::-1])[::-1]
927+
perm_strides = [strides[i] for i in perm]
928+
out[index] = inp[np.dot(np.unravel_index(index, new_shape), perm_strides)]
929+
930+
Args:
931+
name (str): The name of the configuration.
932+
shape (tuple[int, ...]): The shape of the input tensor.
933+
perm (tuple[int, ...]): The permutation of the dimensions.
934+
935+
Returns:
936+
(new_shape, perm_strides) (tuple, tuple): the output shape and permutation strides.
937+
"""
938+
new_shape = tuple(shape[i] for i in perm)
939+
strides = np.cumprod((shape[1:] + (1,))[::-1])[::-1]
940+
perm_strides = tuple(int(strides[i]) for i in perm)
941+
return (new_shape, perm_strides)
942+
916943
@model_optimizer()
917944
def write_hls(self, model):
918945
self.writer.write_hls(model)

hls4ml/backends/fpga/fpga_layers.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -73,12 +73,14 @@ def set_thresholds(self, scale, bias, ternary_threshold=0.5):
7373
class PointwiseConv1D(Conv1D):
7474
'''Optimized Conv1D implementation for 1x1 kernels.'''
7575

76-
# Nothing to do, will pick up function and config from class name
77-
pass
76+
def initialize(self):
77+
# Do noting, values copied
78+
pass
7879

7980

8081
class PointwiseConv2D(Conv2D):
8182
'''Optimized Conv2D implementation for 1x1 kernels.'''
8283

83-
# Nothing to do, will pick up function and config from class name
84-
pass
84+
def initialize(self):
85+
# Do noting, values copied
86+
pass

hls4ml/backends/fpga/passes/hgq_proxy_model.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,10 +75,12 @@ def transform(self, model, node: FixedPointQuantizer):
7575
class ProcessFixedPointQuantizerCall(FunctionCallTemplate):
7676
def __init__(self):
7777
super().__init__(FixedPointQuantizer, include_header=[])
78-
self.template = 'nnet::{name}<{input_t}, {output_t}>({input}, {output});'
78+
self.template = '{namespace}::{name}<{input_t}, {output_t}>({input}, {output});'
7979

8080
def format(self, node):
8181
params = self._default_function_params(node)
82+
namespace = node.model.config.writer_config.get('Namespace', None) or 'nnet'
83+
params['namespace'] = namespace
8284

8385
return self.template.format(**params)
8486

hls4ml/backends/oneapi/oneapi_backend.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,13 +129,30 @@ def get_default_flow(self):
129129
def get_writer_flow(self):
130130
return self._writer_flow
131131

132-
def create_initial_config(self, part='Arria10', clock_period=5, io_type='io_parallel'):
132+
def create_initial_config(self, part='Arria10', clock_period=5, io_type='io_parallel', write_tar=False, **_):
133+
"""Create initial configuration of the oneAPI backend.
134+
135+
Args:
136+
part (str, optional): The FPGA part to be used. Defaults to 'Arria10'.
137+
clock_period (int, optional): The clock period. Defaults to 5.
138+
io_type (str, optional): Type of implementation used. One of
139+
'io_parallel' or 'io_stream'. Defaults to 'io_parallel'.
140+
write_tar (bool, optional): If True, compresses the output directory into a .tar.gz file. Defaults to False.
141+
142+
Returns:
143+
dict: initial configuration.
144+
"""
145+
133146
config = {}
134147

135148
config['Part'] = part if part is not None else 'Arria10'
136149
config['ClockPeriod'] = clock_period
137150
config['IOType'] = io_type
138151
config['HLSConfig'] = {}
152+
config['WriterConfig'] = {
153+
# TODO: add namespace
154+
'WriteTar': write_tar,
155+
}
139156

140157
return config
141158

hls4ml/backends/oneapi/passes/clone_templates.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
""" The clone templates in the fpga backend are not enough for oneAPI, so this adds the missing parts
2-
"""
1+
"""The clone templates in the fpga backend are not enough for oneAPI, so this adds the missing parts"""
32

43
from hls4ml.backends.fpga.passes.clone import Clone
54
from hls4ml.backends.oneapi.oneapi_template import StreamFunctionCallTemplate, TaskSequenceTemplate

0 commit comments

Comments
 (0)