Python: Emit token usage with streaming chat completion agent. #12416
+268
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
The chat completion agent was not emitting token using during streaming invocation because we were only allowing through
response.items
. In the case of token usage,response.items
is [] and the usage is contained as part of the message'smetadata
dict. This PR fixes that bug and allows forresponse.items or response.metadata.get("usage")
. Two new samples are added to the concepts/agents/chat_completion dir to show how one can track token use for streaming and non-streaming agent invocation. Token usage handling is also added to the chat completion agent integration tests.Description
prompt_tokens_details
andcompletion_tokens_details
models that are returned, but not previously handled.Contribution Checklist