-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Open
Description
Hey, why is the out projection from the Attention
block optional? See:
vit-pytorch/vit_pytorch/vit.py
Line 33 in d47c57e
project_out = not (heads == 1 and dim_head == dim) |
-
In the original Attention is all you need paper there is always an out projection
$W^O$ from theAttention
block, as given by the un-numbered equations in section 3.2.2. -
The projection is also always applied in the timm library: see
https://github.com/huggingface/pytorch-image-models/blob/ae0737f5d098900180c4457845dda35433ab92c0/timm/models/vision_transformer.py#L105
Metadata
Metadata
Assignees
Labels
No labels