I have question in position_embedding!!! #3

Konoha-Kai · 2024-08-21T03:00:06Z

Konoha-Kai
Aug 21, 2024

Why does the model opt to add a zero tensor that is involved in the learning process during the position embedding stage, rather than directly incorporating tensors that are already well-encoded and may not be subject to further learning later on

s-chh · 2024-08-21T23:47:36Z

s-chh
Aug 21, 2024
Maintainer

The authors of the Vision Transformer paper [1] compared the performance of a 1-D learnable embedding vs a 2-D encoding method and found no significant difference (Section 3.1). I have tested different positional encodings myself and found learnable and sinusoidal (original) positional encodings get similar results, whereas the recent advanced methods can achieve superior results. You can find the implementation and results at https://github.com/s-chh/2D-Positional-Encoding-Vision-Transformer.

[1] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I have question in position_embedding!!! #3

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

I have question in position_embedding!!! #3

Uh oh!

Konoha-Kai Aug 21, 2024

Replies: 1 comment

Uh oh!

Uh oh!

s-chh Aug 21, 2024 Maintainer

Konoha-Kai
Aug 21, 2024

s-chh
Aug 21, 2024
Maintainer