Replies: 4 comments
-
Hey @xKwan! I'm here to help with your issue. I can assist with bug fixes, answer questions, and guide you to become a contributor. Let's tackle this challenge together! To resolve the Here's a potential fix for the def complete(
self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponse:
kwargs = kwargs if kwargs else {}
params = {**self._model_kwargs, **kwargs}
# build sampling parameters
sampling_params = dict(**params)
sampling_params["prompt"] = prompt
response = post_http_request(self.api_url, sampling_params, stream=False)
output = get_response(response)
# Ensure the response contains the 'text' key
if 'text' not in output[0]:
raise KeyError("The response does not contain the 'text' key")
return CompletionResponse(text=output[0]['text']) This code checks if the Additionally, ensure that your from llama_index.llms.vllm import VllmServer
def messages_to_prompt(messages):
prompt = "\n".join([str(x) for x in messages])
return f"<s>[INST] {prompt} [/INST] </s>\n"
def completion_to_prompt(completion):
return f"<s>[INST] {completion} [/INST] </s>\n"
llm = VllmServer(
api_url="http://localhost:8000/v1/completions",
max_new_tokens=256,
temperature=0.1,
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
)
llm.complete("What is a black hole?") Ensure that the |
Beta Was this translation helpful? Give feedback.
-
did you find a solution? I have the same issue. |
Beta Was this translation helpful? Give feedback.
-
I followed your suggestion same error: ne 9, in get_response (propertyrag) akube@akube-dev: from llama_index.llms.vllm import VllmServer def complete(
def messages_to_prompt(messages): def completion_to_prompt(completion): def main():
if name=='main': |
Beta Was this translation helpful? Give feedback.
-
I tried to rewrite the function in the VllmServer class to solve this problem:
Output:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Issue:
I want to serve LLM application in production mode, thus I am hosting LLM using vLLM and connect documents to LLM using LlamaIndex. When I tried a sample inferencing, I got KeyError: 'text'
Library versions used:
vllm: 0.4.0.post1
llama_index: 0.10.42
llama_index.llms.vllm: 0.1.7
Server Setup:
I installed vllm and started a vllm server with the following command in the terminal:
python3 -m vllm.entrypoints.openai.api_server --model=/path/to/my/local/model --dtype=float16 --tensor-parallel-size=8 --quantization=awq --gpu-memory-utilization=0.7
It is hosted on localhost:8000.
I did a sanity check with curl command:
Application Setup:
I followed the reference guide here:
https://docs.llamaindex.ai/en/stable/api_reference/llms/vllm/#llama_index.llms.vllm.VllmServer
KeyError: 'text'
Beta Was this translation helpful? Give feedback.
All reactions