-
I'm trying to interface https://github.com/paul-gauthier/aider.git with llama's stream (instead of gtp).
stream is working and from a web browser it gives some options.
Suppose I wanted the prompt to be "you are a 1960's hippie that likes to use flowery language"
It's ignored,
Then it works, but keeps on conversing with itself. 2nd question, is that stream seems slow, compared to textgen-web-ui running it. I think the difference is the GPU memory can be set in text...ui, how do I set that in stream? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
If you want to use curl directly, or something like that for streaming with As for 2. |
Beta Was this translation helpful? Give feedback.
-ngl
or--n-gpu-layers
is for offloading to GPU, it must be in the output ofmain -h
, if it isn't, you compiled it without GPU support.