Replies: 1 comment
-
Independent projects can make their own choices. They aren't really "backends", if anything, GGML or llama.cpp as a lib is the backend. Better customizability is obviously great though. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
It was observed in my Min P pull request that llama.cpp is doing something unique with the sampler order in comparison to other LLM inference backends:
The koboldcpp fork has its own solution for this: an internal 'sampler order' that can be configured by the end user to accomodate for their own personal order, since the transformers library expects that the temperature always comes first.
This allows the user to use the 'official' order where Temperature comes first as desired, or the alternative (that llama.cpp currently assumes) where Temp comes last.
Modifying this order has a significant impact in how the temp changes the model's outputs; if Temperature comes last with a 'safe' Min P or Top P, for example, it will still read correctly even with an obscenely high temperature value. But if Temperature comes first as is expected in the Transformers library (as well as the official GPT2 implementation), this value will naturally have a much different effect, as the actual values will have been modified before those truncation-based samplers were run, causing them to operate on a different 'scale'. I.e a token that would've been considered as under the 'Minimum' in a normal Min P setup might be over the required % if a high temperature was applied first.
The lack of standardization (or customization) here is a problem when it comes to reproducability and consistency for local models; one backend's set of sampler settings has a totally different effect on another backend (such as text-generation-webui's HF loaders, which assume the transformers order of temp first.)
Beta Was this translation helpful? Give feedback.
All reactions