The hanging Rope #3214

Nexesenex · 2023-09-16T10:59:47Z

Nexesenex
Sep 16, 2023

Hey Ladies & Gents,

Like many of us, including some real contributors, I'm often dwelling with rope issues in order to set up the best perplexity curve for the model I'm running.

Basic knowledge is that Llama 1 & 2 have a base rope frequency (theta?) of 10,000, while Code Llama 2 has a base rope of 1,000,000.

Llama 1 is trained of 2,048 tokens sequences, Llama 2 on 4,096, CodeLlama 2 on 16,384. Some people pretrain/train/finetune (what's the difference?) custom models on longer sequences, notably on Llama 1 & 2.

Scale factor or Polar interpolation (like SuperHot, is that the same?) basically work on extended context / original context. Scale 2 = 2048x2 = 4096 context, at the cost of an overall loss of perplexity. Base rope scale = 1/scale factor.

NTK v1 is working differently, using scale frequency to be set up, or an Alpha value : there's an equation linking both in Llama 1 & 2, and another (approximate) linking the Alpha/Base rope frequency to the optimal max context (that's where I am for now), and it's not been figured out for CodeLlama 2 if I understand properly, but CodeLlama is much more steady on its base rope of 1,000,000 no matter what is the context length).

On Llama 1 & 2, we can even use together PI and NTK to reach a higher context length without too much damages on perplexity, but that makes an even more complex equation to link both and chose the correct couple of base and scale, and I'm not algebra savvy.

My question is simple, but calls for a complex answer 👍

Can the Rope experts around here make a wiki about the various techniques of Rope, how to use them and even combine accordingly to the Llama models we use, their inner base model and later customization, or even better, integrate a reliable rope calculation system in the Llama.cpp engine accordingly to all the relevant parameters for Llama 1, Llama 2 and CodeLlama (this one is more tricky) ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The hanging Rope #3214

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

The hanging Rope #3214

Uh oh!

Uh oh!

Nexesenex Sep 16, 2023

Replies: 0 comments

Nexesenex
Sep 16, 2023