Replies: 1 comment
-
It would be great if such a script could also act as a a "Model/Rope auto-loader" by "injecting" a configuration understood by Llamaccp with commands to swap rope config at a given context size, and/or order the reload of the same model/a different one with new rope parameters at a given context size, in order to get the best of both world : a sharp model at first, with a growing rope, then once a long context is already giving a strong basis use a smaller one with a longer context available, with a growing rope, and so on and so forth, from 70b to 33b to 13b to 7b (Yarn rope is coming!), and reach the 100k context on a single 24GB graphic card with the incoming KV8_0! More in details, let's imagine a basic example : use at first a model 33b, then 34B with base context size and rope, then growing the ctx and rope frequency every specified interval of filled ctx beyond the base context size of the model, (+512 or +1024, etc), then down to 13b model with extended context once the memory is almost full with the 33b model, rince and repeat the max ctx / rope game, then 7b extended, rince and repeat the max ctx / rope game again. The goal being to have an optimal starting context, then keep the context whole for as long as possible if more opportune than summerization, this with a seamless transition from one ctx/rope/model to another. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Due to the numerous parameterization possibilities for loading models, I propose a layer responsible for loading parameterization profiles (templates) that would bring the following benefits:
In my development environment I'm running Windows, so I created a PowerShell script capable of acting as a configuration file loader (.ini) that acts in such a way as to receive optional parameters such as "model", "number of model parameters", "prompt". At moment, It can receive the flag "perplexity" to change the execution binary as required and the nice thing is that profiles can also be loaded to perplexity executions. If a parameter is not informed, the script loads default values for a clean execution of the llama with the standard 7b model using in interactive mode.
Beside the script I also created a "profiles" folder to store the .ini configuration files.
Using this logic I converted the current documentation run examples into profiles to be loaded by the script, like alpaca.ini, gtp4all.ini, llama.ini, etc...
This approach also allows for the combination of different prompts and templates, which expands the tool's exploration possibilities.
I believe that such a script in bash or python would also make sense and allow the same gains on different operating systems.
Examples
Current command line execution:
.\bin\Release\main.exe -m C:\.ai\.models\alpaca\13B\ggml-alpaca-13b-ggjt-q4.bin --color --n_predict 512 --ctx_size 2048 --top_k 10000 --temp 0.2 --repeat_penalty 1 --threads 24 --instruct --interactive --reverse-prompt "User:" --file ./prompts/alpaca.txt
Proposed command line execution:
\profile_loader.ps1 -profile alpaca
The content in alpaca.ini profile
name = alpaca
model = ggml-alpaca-lora-q4_0-ggjt.bin
color =
batch_size = 256
top_k = 10000
temp = 0.2
repeat_penalty = 1.0
threads = 12
instruct =
prompt = alpaca
You can check out my current script here, but don't expect anything too complex or advanced, i'm not an expert, I'm just tryiing things and want to know if this idea makes sense to anyone else.
Beta Was this translation helpful? Give feedback.
All reactions