Skip to content

Commit 41907c0

Browse files
committed
Fixing conflicts
2 parents 64bb153 + bae3855 commit 41907c0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+5534
-576
lines changed

.github/workflows/pre-commit.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,6 @@ jobs:
2121
submodules: recursive
2222

2323
- name: Pre-commit
24-
uses: pre-commit/action@v3.0.0
24+
uses: pre-commit/action@v3.0.1
2525
with:
2626
extra_args: --hook-stage manual --all-files

.github/workflows/pypi-publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ jobs:
2020
- name: Build SDist and Wheel
2121
run: pipx run build --sdist --wheel
2222

23-
- uses: actions/upload-artifact@v3
23+
- uses: actions/upload-artifact@v4
2424
with:
2525
path: dist/*.*
2626

.github/workflows/test-sphinx.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,6 @@ jobs:
2222
with:
2323
pre-build-command: "git config --system --add safe.directory '*'"
2424
docs-folder: "docs/"
25-
- uses: actions/upload-artifact@v3
25+
- uses: actions/upload-artifact@v4
2626
with:
2727
path: docs/_build/html

.pre-commit-config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ exclude: (^hls4ml\/templates\/(vivado|quartus)\/(ap_types|ac_types)\/|^test/pyte
22

33
repos:
44
- repo: https://github.com/psf/black
5-
rev: 23.11.0
5+
rev: 24.2.0
66
hooks:
77
- id: black
88
language_version: python3
@@ -24,13 +24,13 @@ repos:
2424
- id: trailing-whitespace
2525

2626
- repo: https://github.com/PyCQA/isort
27-
rev: 5.12.0
27+
rev: 5.13.2
2828
hooks:
2929
- id: isort
3030
args: ["--profile", "black", --line-length=125]
3131

3232
- repo: https://github.com/asottile/pyupgrade
33-
rev: v3.15.0
33+
rev: v3.15.1
3434
hooks:
3535
- id: pyupgrade
3636
args: ["--py36-plus"]
@@ -41,7 +41,7 @@ repos:
4141
- id: setup-cfg-fmt
4242

4343
- repo: https://github.com/pycqa/flake8
44-
rev: 6.1.0
44+
rev: 7.0.0
4545
hooks:
4646
- id: flake8
4747
exclude: docs/conf.py

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ type: software
44
authors:
55
- given-names: "FastML Team"
66
title: "hls4ml"
7-
version: "v0.8.0"
7+
version: "v0.8.1"
88
doi: 10.5281/zenodo.1201549
99
repository-code: "https://github.com/fastmachinelearning/hls4ml"
1010
url: "https://fastmachinelearning.org/hls4ml"

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ If you use this software in a publication, please cite the software
6969
title = {fastmachinelearning/hls4ml},
7070
year = 2023,
7171
publisher = {Zenodo},
72-
version = {v0.8.0},
72+
version = {v0.8.1},
7373
doi = {10.5281/zenodo.1201549},
7474
url = {https://github.com/fastmachinelearning/hls4ml}
7575
}
@@ -142,7 +142,7 @@ Please use the following text for this acknowledgment:
142142
> We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators. This community and \<names of individuals\>, in particular, were important for the development of this project.
143143
144144
# Funding
145-
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for <a href="https://a3d3.ai">Accelerating AI Algorithms for Data Driven Discovery (A3D3)</a> under Cooperative Agreement No. <a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117997">OAC-2117997</a>, U.S. Department of Energy (DOE) Office of Science, Office of Advanced Scientific Computing Research under the Real‐time Data Reduction Codesign at the Extreme Edge for Science (XDR) Project (<a href="https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002501.pdf">DE-FOA-0002501</a>), DOE Office of Science, Office of High Energy Physics Early Career Research Program (<a href="https://pamspublic.science.energy.gov/WebPAMSExternal/Interface/Common/ViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10">DE-SC0021187</a>, DE-0000247070), and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. <a href="https://doi.org/10.3030/772369">772369</a>).
145+
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for <a href="https://a3d3.ai">Accelerating AI Algorithms for Data Driven Discovery (A3D3)</a> under Cooperative Agreement No. <a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117997">PHY-2117997</a>, U.S. Department of Energy (DOE) Office of Science, Office of Advanced Scientific Computing Research under the Real‐time Data Reduction Codesign at the Extreme Edge for Science (XDR) Project (<a href="https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002501.pdf">DE-FOA-0002501</a>), DOE Office of Science, Office of High Energy Physics Early Career Research Program (<a href="https://pamspublic.science.energy.gov/WebPAMSExternal/Interface/Common/ViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10">DE-SC0021187</a>, DE-0000247070), and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. <a href="https://doi.org/10.3030/772369">772369</a>).
146146

147147
<p align="center">
148148
<img src="https://github.com/fastmachinelearning/hls4ml/assets/29201053/bd1217d4-9930-47b7-8917-ad3fc430c75d" alt="A3D3" width="130"/>

contrib/kl_layer/kl_layer.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
77
The HLS part is in contrib/kl_layer/kl_layer.h
88
"""
9+
910
from pathlib import Path
1011

1112
import numpy as np

docs/advanced/model_optimization.rst

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
=================================
2+
Hardware-aware Optimization API
3+
=================================
4+
5+
Pruning and weight sharing are effective techniques to reduce model footprint and computational requirements. The hls4ml Optimization API introduces hardware-aware pruning and weight sharing.
6+
By defining custom objectives, the algorithm solves a Knapsack optimization problem aimed at maximizing model performance, while keeping the target resource(s) at a minimum. Out-of-the box objectives include network sparsity, GPU FLOPs, Vivado DSPs, memory utilization etc.
7+
8+
The code block below showcases three use cases of the hls4ml Optimization API - network sparsity (unstructured pruning), GPU FLOPs (structured pruning) and Vivado DSP utilization (pattern pruning). First, we start with unstructured pruning:
9+
10+
.. code-block:: Python
11+
12+
from sklearn.metrics import accuracy_score
13+
from tensorflow.keras.optimizers import Adam
14+
from tensorflow.keras.metrics import CategoricalAccuracy
15+
from tensorflow.keras.losses import CategoricalCrossentropy
16+
from hls4ml.optimization.keras import optimize_model
17+
from hls4ml.optimization.keras.utils import get_model_sparsity
18+
from hls4ml.optimization.attributes import get_attributes_from_keras_model
19+
from hls4ml.optimization.objectives import ParameterEstimator
20+
from hls4ml.optimization.scheduler import PolynomialScheduler
21+
# Define baseline model and load data
22+
# X_train, y_train = ...
23+
# X_val, y_val = ...
24+
# X_test, y_test = ...
25+
# baseline_model = ...
26+
# Evaluate baseline model
27+
y_baseline = baseline_model.predict(X_test)
28+
acc_base = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_baseline, axis=1))
29+
sparsity, layers = get_model_sparsity(baseline_model)
30+
print(f'Baseline Keras accuracy: {acc_base}')
31+
print(f'Baseline Keras sparsity, overall: {sparsity}')
32+
print(f'Baseline Keras sparsity, per-layer: {layers}')
33+
# Defining training parameters
34+
# Epochs refers to the number of maximum epochs to train a model, after imposing some sparsity
35+
# If the model is pre-trained, a good rule of thumb is to use between a 1/3 and 1/2 of the number of epochs used to train baseline model
36+
epochs = 10
37+
batch_size = 128
38+
metric = 'accuracy'
39+
optimizer = Adam()
40+
loss_fn = CategoricalCrossentropy(from_logits=True)
41+
42+
# Define the metric to monitor, as well as if its increasing or decreasing
43+
# This disctinction allows us to optimize both regression and classification models
44+
# In regression, e.g. minimize validation MSE & for classification e.g. maximize accuracy
45+
metric, increasing = CategoricalAccuracy(), True
46+
# Relative tolerance (rtol) is the the relative loss in metric the optimized model is allowed to incur
47+
rtol = 0.975
48+
49+
# A scheduler defines how the sparsity is incremented at each step
50+
# In this case, the maximum sparsity is 50% and it will be applied at a polynomially decreasing rate, for 10 steps
51+
# If the final sparsity is unspecified, it is set to 100%
52+
# The optimization algorithm stops either when (i) the relative drop in performance is below threshold or (ii) final sparsity reached
53+
scheduler = PolynomialScheduler(5, final_sparsity=0.5)
54+
# Get model attributes
55+
model_attributes = get_attributes_from_keras_model(baseline_model)
56+
57+
# Optimize model
58+
# ParameterEstimator is the objective and, in this case, the objective is to minimize the total number of parameters
59+
optimized_model = optimize_model(
60+
baseline_model, model_attributes, ParameterEstimator, scheduler,
61+
X_train, y_train, X_val, y_val, batch_size, epochs, optimizer, loss_fn, metric, increasing, rtol
62+
)
63+
# Evaluate optimized model
64+
y_optimized = optimized_model.predict(X_test)
65+
acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1))
66+
sparsity, layers = get_model_sparsity(optimized_model)
67+
print(f'Optimized Keras accuracy: {acc_optimized}')
68+
print(f'Optimized Keras sparsity, overall: {sparsity}')
69+
print(f'Opimized Keras sparsity, per-layer: {layers}')
70+
71+
In a similar manner, it is possible to target GPU FLOPs or Vivado DSPs. However, in that case, sparsity is not equivalent to model sparsity.
72+
Instead, it is the sparsity of the target resource. As an example: Starting with a network utilizing 512 DSPs and a final sparsity of 50%; the optimized network will use 256 DSPs.
73+
74+
To optimize GPU FLOPs, the code is similar to above:
75+
76+
.. code-block:: Python
77+
78+
from hls4ml.optimization.objectives.gpu_objectives import GPUFLOPEstimator
79+
80+
# Optimize model
81+
# Note the change from ParameterEstimator to GPUFLOPEstimator
82+
optimized_model = optimize_model(
83+
baseline_model, model_attributes, GPUFLOPEstimator, scheduler,
84+
X_train, y_train, X_val, y_val, batch_size, epochs, optimizer, loss_fn, metric, increasing, rtol
85+
)
86+
87+
# Evaluate optimized model
88+
y_optimized = optimized_model.predict(X_test)
89+
acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1))
90+
print(f'Optimized Keras accuracy: {acc_optimized}')
91+
# Note the difference in total number of parameters
92+
# Optimizing GPU FLOPs is equivalent to removing entire structures (filters, neurons) from the network
93+
print(baseline_model.summary())
94+
print(optimized_model.summary())
95+
96+
Finally, optimizing Vivado DSPs is possible, given a hls4ml config:
97+
98+
.. code-block:: Python
99+
100+
from hls4ml.utils.config import config_from_keras_model
101+
from hls4ml.optimization.objectives.vivado_objectives import VivadoDSPEstimator
102+
103+
# Note the change from optimize_model to optimize_keras_model_for_hls4ml
104+
# The function optimize_keras_model_for_hls4ml acts as a wrapper for the function, parsing hls4ml config to model attributes
105+
from hls4ml.optimization import optimize_keras_model_for_hls4ml
106+
107+
# Create hls4ml config
108+
default_reuse_factor = 4
109+
default_precision = 'ac_fixed<16, 6>'
110+
hls_config = config_from_keras_model(baseline_model, granularity='name', default_precision=default_precision, default_reuse_factor=default_reuse_factor)
111+
hls_config['IOType'] = 'io_parallel'
112+
hls_config['Model']['Strategy'] = 'Resource' # Strategy must be present for optimisation
113+
114+
# Optimize model
115+
# Note the change from ParameterEstimator to VivadoDSPEstimator
116+
optimized_model = optimize_keras_model_for_hls4ml(
117+
baseline_model, model_attributes, VivadoDSPEstimator, scheduler,
118+
X_train, y_train, X_val, y_val, batch_size, epochs,
119+
optimizer, loss_fn, metric, increasing, rtol
120+
)
121+
122+
# Evaluate optimized model
123+
y_optimized = optimized_model.predict(X_test)
124+
acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1))
125+
print(f'Optimized Keras accuracy: {acc_optimized}')
126+
127+
There are two more Vivado "optimizers" - VivadoFFEstimator, aimed at reducing register utilisation and VivadoMultiObjectiveEstimator, aimed at optimising BRAM and DSP utilisation.
128+
Note, to ensure DSPs are optimized, "unrolled" Dense multiplication must be used before synthesing HLS, by modifying the config:
129+
130+
.. code-block:: Python
131+
132+
hls_config = config_from_keras_model(optimized_model)
133+
hls_config['Model']['DenseResourceImplementation'] = 'Unrolled'
134+
# Any addition hls4ml config, such as strategy, reuse factor etc...

docs/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
advanced/fifo_depth
2626
advanced/extension
2727
advanced/accelerator
28+
advanced/model_optimization
2829

2930
.. toctree::
3031
:hidden:
@@ -34,6 +35,7 @@
3435
autodoc/hls4ml.backends
3536
autodoc/hls4ml.converters
3637
autodoc/hls4ml.model
38+
autodoc/hls4ml.optimization
3739
autodoc/hls4ml.report
3840
autodoc/hls4ml.utils
3941
autodoc/hls4ml.writer

docs/reference.rst

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ If you use this software in a publication, please cite the software
1414
title = {fastmachinelearning/hls4ml},
1515
year = 2023,
1616
publisher = {Zenodo},
17-
version = {v0.8.0},
17+
version = {v0.8.1},
1818
doi = {10.5281/zenodo.1201549},
1919
url = {https://github.com/fastmachinelearning/hls4ml}
2020
}
@@ -86,6 +86,19 @@ binary/ternary networks:
8686
year = "2021"
8787
}
8888
89+
optimization API:
90+
91+
.. code-block:: bibtex
92+
93+
@article{Ramhorst:2023fpga,
94+
author = "Benjamin Ramhorst and others",
95+
title = "{FPGA Resource-aware Structured Pruning for Real-Time Neural Networks}",
96+
eprint = "2308.05170",
97+
archivePrefix = "arXiv",
98+
primaryClass = "cs.AR",
99+
year = "2023"
100+
}
101+
89102
Acknowledgments
90103
===============
91104
If you benefited from participating in our community, we ask that you please acknowledge the Fast Machine Learning collaboration, and particular individuals who helped you, in any publications.
@@ -96,22 +109,22 @@ Please use the following text for this acknowledgment:
96109

97110
Funding
98111
=======
99-
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for `Accelerating AI Algorithms for Data Driven Discovery (A3D3) <https://a3d3.ai>`_ under Cooperative Agreement No. `OAC-2117997 <https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117997>`_, U.S. Department of Energy (DOE) Office of Science, Office of Advanced Scientific Computing Research under the Real‐time Data Reduction Codesign at the Extreme Edge for Science (XDR) Project (`DE-FOA-0002501 <https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002501.pdf>`_), DOE Office of Science, Office of High Energy Physics Early Career Research Program (`DE-SC0021187 <https://pamspublic.science.energy.gov/WebPAMSExternal/Interface/Common/ViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10>`_, DE-0000247070), and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. `772369 <https://doi.org/10.3030/772369>`_).
112+
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for `Accelerating AI Algorithms for Data Driven Discovery (A3D3) <https://a3d3.ai>`_ under Cooperative Agreement No. `PHY-2117997 <https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117997>`_, U.S. Department of Energy (DOE) Office of Science, Office of Advanced Scientific Computing Research under the Real‐time Data Reduction Codesign at the Extreme Edge for Science (XDR) Project (`DE-FOA-0002501 <https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002501.pdf>`_), DOE Office of Science, Office of High Energy Physics Early Career Research Program (`DE-SC0021187 <https://pamspublic.science.energy.gov/WebPAMSExternal/Interface/Common/ViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10>`_, DE-0000247070), and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. `772369 <https://doi.org/10.3030/772369>`_).
100113

101-
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/d4b6e2a3-3537-4413-9809-8153a7d624d6
102-
:height: 200
114+
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/29201053/bd1217d4-9930-47b7-8917-ad3fc430c75d
115+
:height: 130
103116
:align: center
104117

105118
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/16e77374-9829-40a8-800e-8d12018a7cb3
106-
:height: 200
119+
:height: 130
107120
:align: center
108121

109122
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/de6ca6ea-4d1c-4c56-9d93-f759914bbbf9
110-
:height: 200
123+
:height: 130
111124
:align: center
112125

113126
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/7a369971-a381-4bb8-932a-7162b173cbac
114-
:height: 200
127+
:height: 130
115128
:align: center
116129

117130
Contributors

0 commit comments

Comments
 (0)