Inquiry: Fusion Encoder in Downstream Task #8

JungMinKyun · 2025-04-11T04:03:34Z

Firs of all, thank you very much for your research, I'm really interested in this works.

I have a question regarding the "Fusion Encoder" component of the downstream task,
specifically in the context of the linear probe evaluation.

My understanding is that, for the linear probe, you freeze the pre-trained encoder,
attach a classifier head on top, and then perform classification.
In the code, I see that you load pre-trained weights for the Image and Audio encoders, but for the Fusion Encoder you initialize it randomly, freeze it, and then train only the classifier.

I wonder whether it might make sense to load pre-trained weights for the Fusion Encoder as well, or at least allow it to be learnable if it is randomly initialized.
(If I misunderstanding something in here, please tell me)

Thank you for your time, and I would greatly appreciate any clarification or correction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inquiry: Fusion Encoder in Downstream Task #8

Inquiry: Fusion Encoder in Downstream Task #8

JungMinKyun commented Apr 11, 2025

Inquiry: Fusion Encoder in Downstream Task #8

Inquiry: Fusion Encoder in Downstream Task #8

Comments

JungMinKyun commented Apr 11, 2025