Help needed to prompt model #2972
-
Hey folks, sorry to bother you, if it is the wrong section, but can someone help me understand what I might be doing wrong here? from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt
llm = LlamaCPP(
# You can pass in the URL to a GGML model to download it automatically
model_url="https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GGUF/resolve/main/openbuddy-llama2-13b-v11.1.Q4_K_M.gguf",
# optionally, you can set the path to a pre-downloaded model instead of model_url
temperature=0.1,
max_new_tokens=256,
# llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
context_window=4096,
# kwargs to pass to __call__()
generate_kwargs={},
# kwargs to pass to __init__()
# set to at least 1 to use GPU
model_kwargs={"n_gpu_layers": -1},
)
response = llm.complete("Hello! Can you tell me a poem about cats and dogs?")
print (llm)
print(response) It is working correctly, the model is being loaded into the gpu and samples and results are good, but I don't get any response, no matter the prompt, my guess is that I am missing something in the configuration of the model. Any ideas? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
It looks like you're using https://github.com/jerryjliu/llama_index which would be using https://github.com/abetlen/llama-cpp-python internally. You're probably better off asking for help there since you're not using |
Beta Was this translation helpful? Give feedback.
It looks like you're using https://github.com/jerryjliu/llama_index which would be using https://github.com/abetlen/llama-cpp-python internally. You're probably better off asking for help there since you're not using
llama.cpp
directly but through a binding instead.