|
| 1 | +================================= |
| 2 | +Hardware-aware Optimization API |
| 3 | +================================= |
| 4 | + |
| 5 | +Pruning and weight sharing are effective techniques to reduce model footprint and computational requirements. The hls4ml Optimization API introduces hardware-aware pruning and weight sharing. |
| 6 | +By defining custom objectives, the algorithm solves a Knapsack optimization problem aimed at maximizing model performance, while keeping the target resource(s) at a minimum. Out-of-the box objectives include network sparsity, GPU FLOPs, Vivado DSPs, memory utilization etc. |
| 7 | + |
| 8 | +The code block below showcases three use cases of the hls4ml Optimization API - network sparsity (unstructured pruning), GPU FLOPs (structured pruning) and Vivado DSP utilization (pattern pruning). First, we start with unstructured pruning: |
| 9 | + |
| 10 | +.. code-block:: Python |
| 11 | +
|
| 12 | + from sklearn.metrics import accuracy_score |
| 13 | + from tensorflow.keras.optimizers import Adam |
| 14 | + from tensorflow.keras.metrics import CategoricalAccuracy |
| 15 | + from tensorflow.keras.losses import CategoricalCrossentropy |
| 16 | + from hls4ml.optimization.keras import optimize_model |
| 17 | + from hls4ml.optimization.keras.utils import get_model_sparsity |
| 18 | + from hls4ml.optimization.attributes import get_attributes_from_keras_model |
| 19 | + from hls4ml.optimization.objectives import ParameterEstimator |
| 20 | + from hls4ml.optimization.scheduler import PolynomialScheduler |
| 21 | + # Define baseline model and load data |
| 22 | + # X_train, y_train = ... |
| 23 | + # X_val, y_val = ... |
| 24 | + # X_test, y_test = ... |
| 25 | + # baseline_model = ... |
| 26 | + # Evaluate baseline model |
| 27 | + y_baseline = baseline_model.predict(X_test) |
| 28 | + acc_base = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_baseline, axis=1)) |
| 29 | + sparsity, layers = get_model_sparsity(baseline_model) |
| 30 | + print(f'Baseline Keras accuracy: {acc_base}') |
| 31 | + print(f'Baseline Keras sparsity, overall: {sparsity}') |
| 32 | + print(f'Baseline Keras sparsity, per-layer: {layers}') |
| 33 | + # Defining training parameters |
| 34 | + # Epochs refers to the number of maximum epochs to train a model, after imposing some sparsity |
| 35 | + # If the model is pre-trained, a good rule of thumb is to use between a 1/3 and 1/2 of the number of epochs used to train baseline model |
| 36 | + epochs = 10 |
| 37 | + batch_size = 128 |
| 38 | + metric = 'accuracy' |
| 39 | + optimizer = Adam() |
| 40 | + loss_fn = CategoricalCrossentropy(from_logits=True) |
| 41 | +
|
| 42 | + # Define the metric to monitor, as well as if its increasing or decreasing |
| 43 | + # This disctinction allows us to optimize both regression and classification models |
| 44 | + # In regression, e.g. minimize validation MSE & for classification e.g. maximize accuracy |
| 45 | + metric, increasing = CategoricalAccuracy(), True |
| 46 | + # Relative tolerance (rtol) is the the relative loss in metric the optimized model is allowed to incur |
| 47 | + rtol = 0.975 |
| 48 | +
|
| 49 | + # A scheduler defines how the sparsity is incremented at each step |
| 50 | + # In this case, the maximum sparsity is 50% and it will be applied at a polynomially decreasing rate, for 10 steps |
| 51 | + # If the final sparsity is unspecified, it is set to 100% |
| 52 | + # The optimization algorithm stops either when (i) the relative drop in performance is below threshold or (ii) final sparsity reached |
| 53 | + scheduler = PolynomialScheduler(5, final_sparsity=0.5) |
| 54 | + # Get model attributes |
| 55 | + model_attributes = get_attributes_from_keras_model(baseline_model) |
| 56 | +
|
| 57 | + # Optimize model |
| 58 | + # ParameterEstimator is the objective and, in this case, the objective is to minimize the total number of parameters |
| 59 | + optimized_model = optimize_model( |
| 60 | + baseline_model, model_attributes, ParameterEstimator, scheduler, |
| 61 | + X_train, y_train, X_val, y_val, batch_size, epochs, optimizer, loss_fn, metric, increasing, rtol |
| 62 | + ) |
| 63 | + # Evaluate optimized model |
| 64 | + y_optimized = optimized_model.predict(X_test) |
| 65 | + acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1)) |
| 66 | + sparsity, layers = get_model_sparsity(optimized_model) |
| 67 | + print(f'Optimized Keras accuracy: {acc_optimized}') |
| 68 | + print(f'Optimized Keras sparsity, overall: {sparsity}') |
| 69 | + print(f'Opimized Keras sparsity, per-layer: {layers}') |
| 70 | +
|
| 71 | +In a similar manner, it is possible to target GPU FLOPs or Vivado DSPs. However, in that case, sparsity is not equivalent to model sparsity. |
| 72 | +Instead, it is the sparsity of the target resource. As an example: Starting with a network utilizing 512 DSPs and a final sparsity of 50%; the optimized network will use 256 DSPs. |
| 73 | + |
| 74 | +To optimize GPU FLOPs, the code is similar to above: |
| 75 | + |
| 76 | +.. code-block:: Python |
| 77 | +
|
| 78 | + from hls4ml.optimization.objectives.gpu_objectives import GPUFLOPEstimator |
| 79 | +
|
| 80 | + # Optimize model |
| 81 | + # Note the change from ParameterEstimator to GPUFLOPEstimator |
| 82 | + optimized_model = optimize_model( |
| 83 | + baseline_model, model_attributes, GPUFLOPEstimator, scheduler, |
| 84 | + X_train, y_train, X_val, y_val, batch_size, epochs, optimizer, loss_fn, metric, increasing, rtol |
| 85 | + ) |
| 86 | +
|
| 87 | + # Evaluate optimized model |
| 88 | + y_optimized = optimized_model.predict(X_test) |
| 89 | + acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1)) |
| 90 | + print(f'Optimized Keras accuracy: {acc_optimized}') |
| 91 | + # Note the difference in total number of parameters |
| 92 | + # Optimizing GPU FLOPs is equivalent to removing entire structures (filters, neurons) from the network |
| 93 | + print(baseline_model.summary()) |
| 94 | + print(optimized_model.summary()) |
| 95 | +
|
| 96 | +Finally, optimizing Vivado DSPs is possible, given a hls4ml config: |
| 97 | + |
| 98 | +.. code-block:: Python |
| 99 | +
|
| 100 | + from hls4ml.utils.config import config_from_keras_model |
| 101 | + from hls4ml.optimization.objectives.vivado_objectives import VivadoDSPEstimator |
| 102 | +
|
| 103 | + # Note the change from optimize_model to optimize_keras_model_for_hls4ml |
| 104 | + # The function optimize_keras_model_for_hls4ml acts as a wrapper for the function, parsing hls4ml config to model attributes |
| 105 | + from hls4ml.optimization import optimize_keras_model_for_hls4ml |
| 106 | +
|
| 107 | + # Create hls4ml config |
| 108 | + default_reuse_factor = 4 |
| 109 | + default_precision = 'ac_fixed<16, 6>' |
| 110 | + hls_config = config_from_keras_model(baseline_model, granularity='name', default_precision=default_precision, default_reuse_factor=default_reuse_factor) |
| 111 | + hls_config['IOType'] = 'io_parallel' |
| 112 | + hls_config['Model']['Strategy'] = 'Resource' # Strategy must be present for optimisation |
| 113 | +
|
| 114 | + # Optimize model |
| 115 | + # Note the change from ParameterEstimator to VivadoDSPEstimator |
| 116 | + optimized_model = optimize_keras_model_for_hls4ml( |
| 117 | + baseline_model, model_attributes, VivadoDSPEstimator, scheduler, |
| 118 | + X_train, y_train, X_val, y_val, batch_size, epochs, |
| 119 | + optimizer, loss_fn, metric, increasing, rtol |
| 120 | + ) |
| 121 | +
|
| 122 | + # Evaluate optimized model |
| 123 | + y_optimized = optimized_model.predict(X_test) |
| 124 | + acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1)) |
| 125 | + print(f'Optimized Keras accuracy: {acc_optimized}') |
| 126 | +
|
| 127 | +There are two more Vivado "optimizers" - VivadoFFEstimator, aimed at reducing register utilisation and VivadoMultiObjectiveEstimator, aimed at optimising BRAM and DSP utilisation. |
| 128 | +Note, to ensure DSPs are optimized, "unrolled" Dense multiplication must be used before synthesing HLS, by modifying the config: |
| 129 | + |
| 130 | +.. code-block:: Python |
| 131 | +
|
| 132 | + hls_config = config_from_keras_model(optimized_model) |
| 133 | + hls_config['Model']['DenseResourceImplementation'] = 'Unrolled' |
| 134 | + # Any addition hls4ml config, such as strategy, reuse factor etc... |
0 commit comments