Increasing LLM output length #10431

aiwalter · 2024-02-03T12:38:04Z

aiwalter
Feb 3, 2024

Is there a way in Llama-Index to create longer outputs than LLM output token length? E.g. by doing several sequential LLM completion calls where the last output is always appended and given for next completion as additional input. Or any other strategy?

@aiwalter · 2024-02-03T12:38:18Z

dosubot[bot]
bot Feb 3, 2024

🤖

Hey @aiwalter, nice to see you back! Hope you're doing fantastically well. 😊

Yes, there is a way to generate longer outputs than the LLM output token length in LlamaIndex. This can be achieved by using the stream_complete method in the LangChainLLM class. This method uses sequential LLM completion calls to generate a stream of completion responses, which can be combined to form a longer output.

Here is the relevant code:

@llm_completion_callback()
def stream_complete(
    self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponseGen:
    if not formatted:
        prompt = self.completion_to_prompt(prompt)

    from llama_index.langchain_helpers.streaming import (
        StreamingGeneratorCallbackHandler,
    )

    handler = StreamingGeneratorCallbackHandler()

    if not hasattr(self._llm, "streaming"):
        raise ValueError("LLM must support streaming.")
    if not hasattr(self._llm, "callbacks"):
        raise ValueError("LLM must support callbacks to use streaming.")

    self._llm.callbacks = [handler]  # type: ignore
    self._llm.streaming = True  # type: ignore

    thread = Thread(target=self.complete, args=[prompt], kwargs=kwargs)
    thread.start()

    response_gen = handler.get_response_gen()

    def gen() -> Generator[CompletionResponse, None, None]:
        text = ""
        for delta in response_gen:
            text += delta
            yield CompletionResponse(delta=delta, text=text)

    return gen()

In this method, a StreamingGeneratorCallbackHandler is created and set as the callback for the LLM. The LLM is then set to streaming mode and a new thread is started to call the complete method with the given prompt. The complete method generates a stream of completion responses, which are collected by the StreamingGeneratorCallbackHandler. The stream_complete method then generates a stream of CompletionResponse objects, each containing a part of the output. These parts can be combined to form a longer output.

You can find this code in the llama_index/llms/langchain.py file.

As for the maximum token length that the LLM can output in a single completion call, I wasn't able to find this information in the repository. It might be dependent on the specific LLM being used, or there might be a default value set somewhere in the LLM or LlamaIndex code. If you have more information about the LLM you're using or if there's a specific token length you're interested in, I might be able to provide a more accurate answer.

Sources

llama_index/llms/langchain.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

1 reply

aiwalter Feb 3, 2024
Author

@dosubot how can I combine this with a Llama-Index Query Engine?

@dosu-bot · 2024-02-03T13:04:16Z

dosubot[bot]
bot Feb 3, 2024

🤖

To increase the LLM output token length in LlamaIndex, you can adjust the max_tokens_key property in the respective provider class. This property sets the maximum number of tokens that the model will generate. Each provider class (AmazonProvider, Ai21Provider, AnthropicProvider, CohereProvider, MetaProvider) has its own max_tokens_key property. For instance, if you're using the AmazonProvider, you can increase the output token length by modifying the max_tokens_key property in the AmazonProvider class:

class AmazonProvider(Provider):
    max_tokens_key = "maxTokenCount"  # Increase this value
    ...

To combine this with a Llama-Index Query Engine using sequential LLM completion calls, you would need to implement a loop that makes multiple completion calls, each time appending the last output to the next input. This can be done using the completion_with_retry function. Here's a simplified example:

def generate_long_output(client, model, initial_prompt, max_retries, num_iterations):
    output = initial_prompt
    for _ in range(num_iterations):
        request_body = PROVIDERS[model].get_request_body(output, {PROVIDERS[model].max_tokens_key: 8000})
        response = completion_with_retry(client, model, request_body, max_retries)
        output += PROVIDERS[model].get_text_from_response(response)
    return output

In this example, num_iterations is the number of sequential completion calls you want to make. The max_tokens_key is set to 8000 for each call, but you can adjust this value as needed. Please note that this is a simplified example and you may need to adjust it to fit your specific use case and handle potential issues such as model-specific token limits and response parsing.

As for the default value of max_tokens_key in each provider class, I wasn't able to find this information in the repository. It might be available elsewhere or I could have missed it. If you need more specific information, please provide more details.

You can find more information about this in the llama_index/llms/bedrock_utils.py file in the repository.

Sources

llama_index/llms/bedrock_utils.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increasing LLM output length #10431

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Increasing LLM output length #10431

Uh oh!

aiwalter Feb 3, 2024

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

dosubot[bot] bot Feb 3, 2024

Sources

Uh oh!

aiwalter Feb 3, 2024 Author

Uh oh!

dosubot[bot] bot Feb 3, 2024

Sources

aiwalter
Feb 3, 2024

Replies: 2 comments 1 reply

dosubot[bot]
bot Feb 3, 2024

aiwalter Feb 3, 2024
Author

dosubot[bot]
bot Feb 3, 2024