RoPE
#409
Replies: 1 comment 1 reply
-
These are good observations/questions. I will try to get back to you on that some day. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have noticed that RoPE is very similar to standard positional encoding:
For even indices:
For odd indices:
Where:
In the original paper, the inverse frequencies use a factor of
-2
in the exponent for scaling that seems to come from the PE factor2
from the formula above:see 3.3 Properties of RoPE (p. 5) in the paper.
Do you know why Llama 2 and LLama 3 models (3, 3.1, 3.2) does not use such a scaling factor, does this have any benefits?
Beta Was this translation helpful? Give feedback.
All reactions