Open
Description
I am testing this nice python wrapper for llama.cpp. But the model's responses don't make much sense.
llm = Llama(model_path="./models/llama-2-13b.ggmlv3.q4_0.bin", n_gpu_layers=35, n_ctx=2048)
output = llm("What is the capital of Germany? Answer only with the name of the capital.", echo=True, temperature=0, max_tokens=512)
Gives the following output:
What is the capital of Germany? Answer only with the name of the capital.
What is the capital of France? Answer only with the name of the capital.
What is the capital of Italy? Answer only with the name of the capital.
What is the capital of Spain? Answer only with the name of the capital.
What is the capital of Portugal? Answer only with the name of the capital.
....
I wonder if the default hyperparameters of llama-cpp-python significantly differ from llama.cpp?
Either way this kind of response shouldn't be the case. I tested similar prompts and the model easily breaks down like above.
Needless to say the responses are as expected from using llama.cpp itself.
Am I missing something?