improving llama.cpp prompt example... #3216
Replies: 1 comment 4 replies
-
It's LLaMA 1 models that are mostly trained with 2,048 context. LLaMA 2 is usually 4,096.
For non-instruction tuned models like it appears this one is you want to write it in a way that it can complete what you write. For example: The following is a detailed encyclopedia-style article about Elon Musk which consists of 2400 words. The article is written in English and formatted in markdown. Blah blah blah. You can find the entire article below this line: Obviously you don't write "blah blah", I just didn't write the whole prompt for you. The point is, if something is completing the text you wrote, it wouldn't make sense for the article not to be below that point. Right? So it's structured so the continuation of the text is what you want it to write. That's the non-instruct format prompting. Don't think of it like a conversation or question and answer. Think of it like a shared text editor where you write some stuff and then the LLM comes along and tries to finish it. For instruct tuned models, you need to take a different approach. Different models use different prompting styles so you should look at the model card or whatever to see how to prompt it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
lora the "do not mention things you are not sure or do not know." really helps with hallucination. it doesnt do the inline markdown references like gpt4 though. if u guys have any ideas how to make it formatted with markdown, pls do mention. chatgpt will show the output i wanted with the prompt below:
with reference to reducing hallucination here: #3209
this is the prompt and t/s i got with my system rtx 4060, 8gb vram, 16gb ram hp laptop victus Ryzen 5:
does anyone know how to make the markdown reference link format shown? try the prompt on chat gpt3 and it will show what i want but llama.cpp doesnt. also i did -c 3620 (because it fits into the vram 8gb "perfectly") but i read somewhere that all llama2 are trained with 2048 context so 3620 "is not advisable"? is this true? anything else i can improve my llama prompt? i'm using gpt4 to rephrase my llama prompt at this stage so not sure if any of u guys have any "cheatsheet" prompts specifically for llama2. will appreciate all info sharing here. thx.
llama_print_timings: load time = 762.10 ms
llama_print_timings: sample time = 887.69 ms / 2394 runs ( 0.37 ms per token, 2696.88 tokens per second)
llama_print_timings: prompt eval time = 138.33 ms / 101 tokens ( 1.37 ms per token, 730.14 tokens per second)
llama_print_timings: eval time = 60778.50 ms / 2393 runs ( 25.40 ms per token, 39.37 tokens per second)
llama_print_timings: total time = 63100.61 ms
Log end
root@ubuntu:/usr/local/src/llama.cpp# ./main -m models/llama-2-7b-lora-assemble.Q4_K_M.gguf -ngl 35 -c 3620 -n 12288 -p "Detailed encyclopedia-style article titled 'elon musk' with a minimum of 2400 words. The content should be in English and formatted in markdown. Do not mention things you are not sure or do not know.Structured with headings, an intro, and conclusion. Include inline citations, external/internal links (excluding images), and the markdown reference link format
This is [an example][id] reference-style link; [id]: http://example.com/ \"Optional Title Here\"
. Integrate advanced markdown elements and a table of contents where appropriate." -e -t 1Beta Was this translation helpful? Give feedback.
All reactions