Skip to content

The bot often gives short answers in chat mode and this makes the bot very boring #8733

Answered by dspasyuk
Zapotecatl asked this question in Q&A
Discussion options

You must be logged in to vote

@Zapotecatl Just in case you want to try llama3.1 you can run it like so, it works well now:
./llama.cpp/llama-cli --model ./models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf --n-gpu-layers 33 -cnv --simple-io -b 2048 --ctx_size 8000 --temp 0.3 -fa -t 6 --top_k 10 --multiline-input --chat-template llama3 -p 'Role and Purpose: You are Alice, a large language model. Your purpose is to assist users by providing information, answering questions, and engaging in meaningful conversations based on the data you were trained on'

make sure to set the size of context window appropriate to your hardware you will need about 24 Gb of Vram to run it with --ctx_size 128000 in the example above I used ctx = 8000

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
2 replies
@Zapotecatl
Comment options

@dspasyuk
Comment options

Comment options

You must be logged in to vote
2 replies
@Zapotecatl
Comment options

@Zapotecatl
Comment options

Answer selected by Zapotecatl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants