You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, LLM software seems to have a schism in the implementation of the temperature option, as I documented here: #3914
I realized that it might be best to modify it so that you can do two temperature 'passes':
An Input Temperature which is ran before any other sampler and changes the original distribution before any changes are made
An Output Temperature which will come last after all the truncation samplers (such as Top K, Top P, etc) have been ran.
The expected implementation of Temperature (as it is used in OpenAI's models and also inference backends) is to modify the original distribution so that truncation samplers such as Top P or Top K aren't strictly necessary, but the current implementation as it is in llama.cpp functions like my description of Output Temperature.
This can be confusing because truncation fundamentally changes the output in a way that is very similar to lower temperature, except it explicitly cuts out bad choices to do this rather than scaling the model's confidence. This is not a flawed approach, but we want to have interpretability in what the model is doing in response to sampler changes instead of people just setting options they don't understand and getting a very skewed and sometimes unnatural representation of what the model is actually predicting.
This would give users freedom because:
You can apply temperature after the model has selected a set of high quality candidates (post-truncation) to 'randomize' in a way that won't invite the 'low quality' token choices, but instead just works like a way to make the model avoid overly predictable outputs while staying in a safe range.
You can apply temperature before the model has selected its list of candidates in the case that you wanted the make the raw probabilities a little less pre-determined overall before you cut out the unlikely candidates.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
At the moment, LLM software seems to have a schism in the implementation of the temperature option, as I documented here:
#3914
I realized that it might be best to modify it so that you can do two temperature 'passes':
The expected implementation of Temperature (as it is used in OpenAI's models and also inference backends) is to modify the original distribution so that truncation samplers such as Top P or Top K aren't strictly necessary, but the current implementation as it is in llama.cpp functions like my description of Output Temperature.
This can be confusing because truncation fundamentally changes the output in a way that is very similar to lower temperature, except it explicitly cuts out bad choices to do this rather than scaling the model's confidence. This is not a flawed approach, but we want to have interpretability in what the model is doing in response to sampler changes instead of people just setting options they don't understand and getting a very skewed and sometimes unnatural representation of what the model is actually predicting.
This would give users freedom because:
Beta Was this translation helpful? Give feedback.
All reactions