Skip to content

residual connection #1

@alalio

Description

@alalio

Hi,

First of all, thank you very much for your implementation; it saved me a lot of time.

I've noticed a minor difference in comparison to the CoatNet paper. ( ( file coatnet.py , line : [205...214] )

In the ConvTransformer class, you've defined a residual block only for the MultiHeadSelfAttention block. However, in the CoatNet paper and more broadly in the literature, a residual connection is typically applied to both the dense projection block and the multi-head attention block. This approach improves the gradient flow through the deep architecture, enhancing stability. Could you please consider updating your code so that future GitHub users can benefit from your excellent work?

Best regards,
Alae

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions