PyTorch implementation of 2D Positional Encodings for Vision Transformers (ViT). Positional Encodings/Embeddings: Sinusoidal (Absolute), Learnable, Relative and Rotation (Rope).
vit-positional-encoding vision-transformer-positional-encoding vit-rope vit-relative vit-sinusoidal vit-no-position vit-learn-position
-
Updated
Dec 16, 2024 - Python