-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Fix BedrockConverse token count in usage to match OpenAI's format #18383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
...ntegrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/base.py
Show resolved
Hide resolved
I'm unavailable to test the streaming change, since it's not working before and after the change with Langfuse. import uuid
from dotenv import load_dotenv
from langfuse.llama_index import LlamaIndexInstrumentor
from llama_index.core.base.llms.types import ChatMessage, ChatResponseGen
from llama_index.llms.bedrock_converse import BedrockConverse
load_dotenv()
# Setup Langfuse Instrumentor to track everything from LlamaIndex
instrumentor: LlamaIndexInstrumentor = LlamaIndexInstrumentor()
# Bedrock model name
model = "meta.llama3-8b-instruct-v1:0"
# Initialize Bedrock LLM
llm = BedrockConverse(
model=model,
max_retries=3,
)
def stream_llm_on_bedrock():
"""Stream the LLM output."""
session_id: str = uuid.uuid4().hex
user_id: str = uuid.uuid4().hex
with instrumentor.observe(session_id=session_id, user_id=user_id):
response: ChatResponseGen = llm.stream_chat(
messages=[
ChatMessage(role="user", content="Hello"),
]
)
for r in response:
print(r.delta, end="", flush=True)
# Example usage
if __name__ == "__main__":
# Start the instrumentor
instrumentor.start()
stream_llm_on_bedrock()
# Flush the instrumentor to send all data to Langfuse
instrumentor.flush() I might open a new issue for this problem if you don't have any clear fix at the top of your head |
It could be that langfuse just doesn't handle token counts on streaming responses? |
@@ -367,6 +371,10 @@ def stream_chat( | |||
**all_kwargs, | |||
) | |||
|
|||
dict_response = dict(response) | |||
# Add Bedrock's token count to usage dict to match OpenAI's format | |||
dict_response["usage"] = self._get_response_token_counts(dict_response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll actually need to get the usage on each chunk in the stream? I think? You might need to debug whats beining emitted in the chunk stream below to get the usage properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't know the complete usage until the stream is finished, for example
Description
Motivation: I'm currently using LlamaIndex with Bedrock and Langfuse. LLama3 token count tracking is not working with
BedrockConverse
, I made a fix and explained it all here:I updated the
raw
output of theBedrockConverse
Base class that had the wrong usage format.Looking at Langfuse traces we can see that the previous implementation was simply adding the
prompt_token
to the end of the traces.This is due to the usage of
additional_kwargs
in theChatResponse
object.Renaming the
prompt_tokens
,completion_tokens
andtotal_tokens
directly in theresponse['usage']
dictionary fix the issue and follows OpenAI's format.Here is some python code to run Bedrock Converse with Langfuse's tracing to observe the impact:
Running this code was throwing the following warning:
Here is the Langfuse trace output for the execution above:
The new implementation removes the ValueError from above and fixes the usage values:
Langfuse trace example with OpenAI client:
No code were removed to ensure backward compatibility
Fixes # (issue)
New Package?
Did I fill in the
tool.llamahub
section in thepyproject.toml
and provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
Suggested Checklist:
make format; make lint
to appease the lint gods