Replies: 3 comments
-
So I reproduced this with the burn scars example from: https://github.com/IBM/terratorch/blob/main/examples/tutorials/PrithviEOv2/prithvi_v2_eo_300_tl_unet_burnscars.ipynb So those are the values and plots for: ![]() ![]() ![]() And those are the metrics and plots for: ![]() ![]() I think, if this is correct, it is pretty fascinating how well the model performs with random frozen encoder, random frozen decoder and solely a trained head (which I guess is itself a substantial model with 5M parameters). |
Beta Was this translation helpful? Give feedback.
-
Yes, @Atmoboran . I suggested ELM as an analogy, since we have a large network with many fixed and randomly initialized parameters where just the last layer is trainable. Maybe the Universal Approximation Theorem is applicable here, but I would like to have a more formal opinion about it. |
Beta Was this translation helpful? Give feedback.
-
Hi, @Atmoboran . Maybe you could be interested in testing an experimental branch for using Mixture of Experts (MoE) for the head netowork. You can check it here. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I was wondering, with what weights the decoder and the head start if both the encoder and decoder are frozen?
I'm getting suprisingly good results in a cloud segmentation task for a model which only has the head weights trainable.
To be precise, the accuracy is just approx 9% compared to the "best" setup with all weights trainable and the pre-trained backbone.
Best regards,
Atmoboran
Beta Was this translation helpful? Give feedback.
All reactions