Replies: 2 comments 1 reply
-
🤖 Hey @aiwalter, nice to see you back! Hope you're doing fantastically well. 😊 Yes, there is a way to generate longer outputs than the LLM output token length in LlamaIndex. This can be achieved by using the Here is the relevant code: @llm_completion_callback()
def stream_complete(
self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponseGen:
if not formatted:
prompt = self.completion_to_prompt(prompt)
from llama_index.langchain_helpers.streaming import (
StreamingGeneratorCallbackHandler,
)
handler = StreamingGeneratorCallbackHandler()
if not hasattr(self._llm, "streaming"):
raise ValueError("LLM must support streaming.")
if not hasattr(self._llm, "callbacks"):
raise ValueError("LLM must support callbacks to use streaming.")
self._llm.callbacks = [handler] # type: ignore
self._llm.streaming = True # type: ignore
thread = Thread(target=self.complete, args=[prompt], kwargs=kwargs)
thread.start()
response_gen = handler.get_response_gen()
def gen() -> Generator[CompletionResponse, None, None]:
text = ""
for delta in response_gen:
text += delta
yield CompletionResponse(delta=delta, text=text)
return gen() In this method, a You can find this code in the llama_index/llms/langchain.py file. As for the maximum token length that the LLM can output in a single completion call, I wasn't able to find this information in the repository. It might be dependent on the specific LLM being used, or there might be a default value set somewhere in the LLM or LlamaIndex code. If you have more information about the LLM you're using or if there's a specific token length you're interested in, I might be able to provide a more accurate answer. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
🤖 To increase the LLM output token length in LlamaIndex, you can adjust the class AmazonProvider(Provider):
max_tokens_key = "maxTokenCount" # Increase this value
... To combine this with a Llama-Index Query Engine using sequential LLM completion calls, you would need to implement a loop that makes multiple completion calls, each time appending the last output to the next input. This can be done using the def generate_long_output(client, model, initial_prompt, max_retries, num_iterations):
output = initial_prompt
for _ in range(num_iterations):
request_body = PROVIDERS[model].get_request_body(output, {PROVIDERS[model].max_tokens_key: 8000})
response = completion_with_retry(client, model, request_body, max_retries)
output += PROVIDERS[model].get_text_from_response(response)
return output In this example, As for the default value of You can find more information about this in the llama_index/llms/bedrock_utils.py file in the repository. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there a way in Llama-Index to create longer outputs than LLM output token length? E.g. by doing several sequential LLM completion calls where the last output is always appended and given for next completion as additional input. Or any other strategy?
Beta Was this translation helpful? Give feedback.
All reactions