-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Hi,
First of all, thank you very much for your implementation; it saved me a lot of time.
I've noticed a minor difference in comparison to the CoatNet paper. ( ( file coatnet.py , line : [205...214] )
In the ConvTransformer class, you've defined a residual block only for the MultiHeadSelfAttention block. However, in the CoatNet paper and more broadly in the literature, a residual connection is typically applied to both the dense projection block and the multi-head attention block. This approach improves the gradient flow through the deep architecture, enhancing stability. Could you please consider updating your code so that future GitHub users can benefit from your excellent work?
Best regards,
Alae
Metadata
Metadata
Assignees
Labels
No labels