Why are weights not clipped? #598

maicoldubbio · 2024-01-20T12:30:37Z

maicoldubbio
Jan 20, 2024

Hi!

I'm retraining a ResNet32 (from one of the Notebooks) in a HWA fashion. Relevantly, to limit weight bounds during training (which impair accuracy a lot), I set in InferenceRPU these settings: WeightClipParameter(fixed_value=1.0, type=WeightClipType.FIXED_VALUE) and ChannelWise remap. learn_out_scaling = True. My RPU config is very similar to StandardHWATrainingPreset, except for some A/D values and the like.

However by extracting the min/max for each layer, I see that I get some outliers that are actually higher than 1, e.g. 1.5.

Why is that happening?

I wonder whether that could be due to Learning Rate used/number of epochs? Does a low LR (say 0.001) prevent SGD from moving too much from the digital baseline? Or does the HWA training need a given number of epochs to clip?

From what I see, using a low LR allows to reach digital accuracy much quicker -- which makes sense, since a small step might be better to reach again optimum baseline. However if this impairs HWA quality, it's not a good choice.

I don't exclude that perhaps I'm missing something here, e.g. some further settings in RPU config 😃

I'm all ears, any tip is highly appreciated. Thanks for the support!

maljoras · 2024-01-20T14:13:10Z

maljoras
Jan 20, 2024
Maintainer

How do you get your weights from the tile? Note that analog_tile.get_weights() will multiply the digital output scales by default to get the effective weights, which could be any range. The weights are clipped only for the part of the weights that represent the normalized conductances. To get those you have to use analog_tile.get_weights(apply_weight_scaling=False). Those should be clipped.

1 reply

maicoldubbio Jan 20, 2024
Author

Yes, those in the image are scaled. Without scaling they're all within [-1,1] (but I guess they're normalized then?).

Thing is, I cannot explain the difference between 2 very same models except for weight ranges. Consider these 2 models:

The left one performs poorly (drift impairs a lot), the second one is very good (retention alike literature). They've been trained with same RPU (only difference: max_output_size=0 in the left one, 512 right one; however AFAIK it shouldn't make much difference), yet they achieve very different performance results. I should say the left one trained with a higher LR (0.01) and for more epochs, both with scheduling.

What could be responsible for this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why are weights not clipped? #598

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why are weights not clipped? #598

Uh oh!

Uh oh!

maicoldubbio Jan 20, 2024

Replies: 1 comment · 1 reply

Uh oh!

maljoras Jan 20, 2024 Maintainer

Uh oh!

maicoldubbio Jan 20, 2024 Author

maicoldubbio
Jan 20, 2024

Replies: 1 comment 1 reply

maljoras
Jan 20, 2024
Maintainer

maicoldubbio Jan 20, 2024
Author