Fix BedrockConverse token count in usage to match OpenAI's format #18383

ArmandBriere · 2025-04-06T23:58:01Z

Description

Motivation: I'm currently using LlamaIndex with Bedrock and Langfuse. LLama3 token count tracking is not working with BedrockConverse, I made a fix and explained it all here:

I updated the raw output of the BedrockConverse Base class that had the wrong usage format.

Looking at Langfuse traces we can see that the previous implementation was simply adding the prompt_token to the end of the traces.

This is due to the usage of additional_kwargs in the ChatResponse object.

Renaming the prompt_tokens, completion_tokens and total_tokens directly in the response['usage'] dictionary fix the issue and follows OpenAI's format.

Here is some python code to run Bedrock Converse with Langfuse's tracing to observe the impact:

from typing import Any, Sequence
from langfuse.llama_index import LlamaIndexInstrumentor
from llama_index.core.output_parsers import PydanticOutputParser
from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.core.program import LLMTextCompletionProgram
from pydantic import BaseModel
from llama_index.llms.bedrock_converse.utils import (
    messages_to_converse_messages,
    converse_with_retry,
)

from llama_index.core.base.llms.types import (
    ChatMessage,
    ChatResponse,
    MessageRole,
)
from llama_index.core.llms.callbacks import (
    llm_chat_callback,
)

# Setup Langfuse Instrumentor to track everything from LlamaIndex
instrumentor: LlamaIndexInstrumentor = LlamaIndexInstrumentor(
    host="http://localhost:3000",
)

# Start the instrumentor
instrumentor.start()


# Pydantic model for the output
class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: list[Song]


class SerialSeven(BaseModel):
    """Data model for the serial number."""

    serial_number: list[int]


class CustomBedrockConverse(BedrockConverse):
    """Custom Bedrock Converse class to override wrong token count."""

    @llm_chat_callback()
    def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:
        # convert Llama Index messages to AWS Bedrock Converse messages
        converse_messages, system_prompt = messages_to_converse_messages(messages)
        all_kwargs = self._get_all_kwargs(**kwargs)

        # invoke LLM in AWS Bedrock Converse with retry
        response = converse_with_retry(
            client=self._client,
            messages=converse_messages,
            system_prompt=system_prompt,
            max_retries=self.max_retries,
            stream=False,
            guardrail_identifier=self.guardrail_identifier,
            guardrail_version=self.guardrail_version,
            trace=self.trace,
            **all_kwargs,
        )

        content, tool_calls, tool_call_ids, status = self._get_content_and_tool_calls(
            response
        )

        dict_response = dict(response)
        # Add Bedrock's token count to usage dict to match OpenAI's format
        dict_response["usage"] = self._get_response_token_counts(dict_response)

        return ChatResponse(
            message=ChatMessage(
                role=MessageRole.ASSISTANT,
                content=content,
                additional_kwargs={
                    "tool_calls": tool_calls,
                    "tool_call_id": tool_call_ids,
                    "status": status,
                },
            ),
            raw=dict_response,
            additional_kwargs=self._get_response_token_counts(dict(response)),
        )


llm = CustomBedrockConverse(
    model="meta.llama3-8b-instruct-v1:0",
)

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""

program: LLMTextCompletionProgram = LLMTextCompletionProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
    llm=llm,
)


# Define a function to run the workflow
def run_workflow():
    # Create a Langfuse trace
    with instrumentor.observe():
        output = program(movie_name="The shining")

    return output


if __name__ == "__main__":
    print(run_workflow())

    instrumentor.flush()

Running this code was throwing the following warning:

python main.py

Usage object must have either {input, output, total, unit} or {promptTokens, completionTokens, totalTokens}
Traceback (most recent call last):
  File "~/llm-playground/.venv/lib/python3.12/site-packages/langfuse/client.py", line 2722, in update
    "usage": _convert_usage_input(usage) if usage is not None else None,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/llm-playground/.venv/lib/python3.12/site-packages/langfuse/utils/__init__.py", line 103, in _convert_usage_input
    raise ValueError(
ValueError: Usage object must have either {input, output, total, unit} or {promptTokens, completionTokens, totalTokens}

Here is the Langfuse trace output for the execution above:

The new implementation removes the ValueError from above and fixes the usage values:

Langfuse trace example with OpenAI client:

No code were removed to ensure backward compatibility

Fixes # (issue)

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

...ntegrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/base.py

…non stream

ArmandBriere · 2025-04-09T02:37:32Z

I'm unavailable to test the streaming change, since it's not working before and after the change with Langfuse.
I've tried to debug this, using the code below, while using the stream function, the callback function used is not the wrapped_async_llm_chat has I would have guess but the wrapped_llm_chat decorator. This result in the complete absence of the Assistant logs in Langfuse.

import uuid

from dotenv import load_dotenv
from langfuse.llama_index import LlamaIndexInstrumentor
from llama_index.core.base.llms.types import ChatMessage, ChatResponseGen
from llama_index.llms.bedrock_converse import BedrockConverse

load_dotenv()

# Setup Langfuse Instrumentor to track everything from LlamaIndex
instrumentor: LlamaIndexInstrumentor = LlamaIndexInstrumentor()


# Bedrock model name
model = "meta.llama3-8b-instruct-v1:0"

# Initialize Bedrock LLM
llm = BedrockConverse(
    model=model,
    max_retries=3,
)


def stream_llm_on_bedrock():
    """Stream the LLM output."""
    session_id: str = uuid.uuid4().hex
    user_id: str = uuid.uuid4().hex

    with instrumentor.observe(session_id=session_id, user_id=user_id):
        response: ChatResponseGen = llm.stream_chat(
            messages=[
                ChatMessage(role="user", content="Hello"),
            ]
        )

        for r in response:
            print(r.delta, end="", flush=True)


# Example usage
if __name__ == "__main__":
    # Start the instrumentor
    instrumentor.start()

    stream_llm_on_bedrock()

    # Flush the instrumentor to send all data to Langfuse
    instrumentor.flush()

I might open a new issue for this problem if you don't have any clear fix at the top of your head

logan-markewich · 2025-04-09T14:28:38Z

stream_chat is an sync function, so using the sync wrapped_llm_chat makes sense

It could be that langfuse just doesn't handle token counts on streaming responses?

logan-markewich · 2025-04-09T14:29:51Z

...ntegrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/base.py

@@ -367,6 +371,10 @@ def stream_chat(
            **all_kwargs,
        )

+        dict_response = dict(response)
+        # Add Bedrock's token count to usage dict to match OpenAI's format
+        dict_response["usage"] = self._get_response_token_counts(dict_response)


You'll actually need to get the usage on each chunk in the stream? I think? You might need to debug whats beining emitted in the chunk stream below to get the usage properly

It can't know the complete usage until the stream is finished, for example

feat: update token count in usage to match OpenAI's format

ddcf0d2

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 6, 2025

logan-markewich reviewed Apr 7, 2025

View reviewed changes

...ntegrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/base.py Show resolved Hide resolved

fix: apply new token count fix to every chat function for stream and …

1932f79

…non stream

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 9, 2025

logan-markewich reviewed Apr 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BedrockConverse token count in usage to match OpenAI's format #18383

Fix BedrockConverse token count in usage to match OpenAI's format #18383

ArmandBriere commented Apr 6, 2025 •

edited

Loading

ArmandBriere commented Apr 9, 2025

logan-markewich commented Apr 9, 2025

logan-markewich Apr 9, 2025

logan-markewich Apr 9, 2025

Fix BedrockConverse token count in usage to match OpenAI's format #18383

Are you sure you want to change the base?

Fix BedrockConverse token count in usage to match OpenAI's format #18383

Conversation

ArmandBriere commented Apr 6, 2025 • edited Loading

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

ArmandBriere commented Apr 9, 2025

logan-markewich commented Apr 9, 2025

logan-markewich Apr 9, 2025

Choose a reason for hiding this comment

logan-markewich Apr 9, 2025

Choose a reason for hiding this comment

ArmandBriere commented Apr 6, 2025 •

edited

Loading