potential context processing levenstien culling optimization #3956

Swight1423 · 2023-11-05T13:02:25Z

Swight1423
Nov 5, 2023

I have been messing with a personal deterministic model architecture and came up with a potential context optimization though I don't know the availability of the required info in traditional models or if this is already done. This is based on in my head theory crafting I haven't tried implementing it yet.
Requirements:
Knowledge of the tokens following the previous token in the dataset
the levenstien distance for the entire context for the top scoring(lowest to highest) tokens
variable length vocabulary
Process:
find the difference in the levenstien distance between the top scoring tokens
if the next token doesn't beat this difference it means that the next token must be one of the tokens following the previous token in the dataset.
with this knowledge we can reduce the number of tokens to process to those that are long enough to produce a levenstien distance score of at least that difference.
you could further reduce the number to process by trimming the beginning and end of each token where it matches the context since those characters would not influence the levenstien distance.
(the rest is much more theory since I can't fully think through it)
if you do find tokens with a bigger levenstien distance after calculation keep track of the largest one.
my guess would be that you would be looking at the tokens that follow the token with the smallest levenstien distance that is bigger in the dataset.

ggerganov · 2023-11-05T15:44:14Z

ggerganov
Nov 5, 2023
Maintainer

I'm finding it hard to follow the description. Would be better if you provide a specific example.

7 replies

Swight1423 Nov 6, 2023
Author

if my understanding is right not sure that is quite accurate the number of vocab returned for a given character should have an upper bound based on how often it occurs in the dataset could be lower if some of the duplicates are followed by the same token. so for some characters this may not exceed the vocab limit. though it is possible or even likely that others would and demonstrate your scenario. My personal model(which I am developing in vb.net may implement this for that once I sort my software for generating the models) which I came up with this for is kinda a unique case because it has far fewer probabilities to deal with. I thought would bring it up here just in case it was useful outside of my use case.

Swight1423 Nov 6, 2023
Author

most of my knowledge of levenstien distance is from https://www.jetbrains.com/dotnet/guide/tutorials/dotnet-days-online-2022/maximising-algorithm-performance-in-dotnet-levenshtein-distance/ which talks about ways of optimizing the algorithm.

KerfuffleV2 Nov 6, 2023
Collaborator

if my understanding is right not sure that is quite accurate the number of vocab returned for a given character should have an upper bound based on how often it occurs in the dataset c

By "character", you mean token? Anyway, unfortunately that's not correct. You always get back probabilities equal to the size of the vocab.

Since you said you're using VB, perhaps you're using a higher level interface that's obscuring those types of details. For example, if you look at the probabilities after sampling such as Top-K, Top-P, etc then the list will be much, much shorter.

But the model still produced n_vocab probs after evaluation.

Swight1423 Nov 6, 2023
Author

I have actually been trying to write my own tokenizers for the unique needs of my models since I am trying to run experiments with token de-duplication.

Swight1423 Nov 6, 2023
Author

so far I haven't really touched the samplers at all.

KerfuffleV2 · 2023-11-05T18:31:29Z

KerfuffleV2
Nov 5, 2023
Collaborator

I have been messing with a personal deterministic model architecture

Aren't they all deterministic?

if the next token doesn't beat this difference it means that the next token must be one of the tokens following

If you know whether the next token beats the difference, then don't you already know the next token?

Unfortunately, the description is pretty hard to follow. Are you trying to say something like given certain tokens it's basically guaranteed that other ones will follow. For example, like if you have guarant then eed is basically the only other possibility so you can just feed it to the model instead of having the model actually generate it? The idea being to only make the model generate when there are real alternatives to "think" about.

3 replies

Swight1423 Nov 6, 2023
Author

I am saying it is possible to determine the tokens that have a probability of changing the outcome without calculating all of them.

Swight1423 Nov 6, 2023
Author

and the highest of the calculated levenstien distances of that set if over the threshold may have a possibility of determining the actual next token the correct value follows but I am less sure of that part.

Swight1423 Nov 6, 2023
Author

what I meant by deterministic: in my personal experimental model I am developing every token is unique and thus always followed by only one specific token in the dataset. my vocabulary is actually an index into the original dataset. my understanding is a lot of models each token has a percentage chance of being followed by one of a selection of tokens because you have a bunch of duplicates the tokens represent which may be followed by different values.

potential context processing levenstien culling optimization #3956

Uh oh!

Swight1423 Nov 5, 2023

Replies: 2 comments · 10 replies

Uh oh!

ggerganov Nov 5, 2023 Maintainer

Uh oh!

Swight1423 Nov 6, 2023 Author

Uh oh!

Swight1423 Nov 6, 2023 Author

Uh oh!

KerfuffleV2 Nov 6, 2023 Collaborator

Uh oh!

Uh oh!

Swight1423 Nov 6, 2023 Author

Uh oh!

Swight1423 Nov 6, 2023 Author

Uh oh!

KerfuffleV2 Nov 5, 2023 Collaborator

Uh oh!

Swight1423 Nov 6, 2023 Author

Uh oh!

Swight1423 Nov 6, 2023 Author

Uh oh!

Swight1423 Nov 6, 2023 Author

Swight1423
Nov 5, 2023

Replies: 2 comments 10 replies

ggerganov
Nov 5, 2023
Maintainer

Swight1423 Nov 6, 2023
Author

Swight1423 Nov 6, 2023
Author

KerfuffleV2 Nov 6, 2023
Collaborator

Swight1423 Nov 6, 2023
Author

Swight1423 Nov 6, 2023
Author

KerfuffleV2
Nov 5, 2023
Collaborator

Swight1423 Nov 6, 2023
Author

Swight1423 Nov 6, 2023
Author

Swight1423 Nov 6, 2023
Author