-
Notifications
You must be signed in to change notification settings - Fork 4k
Python: Emit token usage with streaming chat completion agent. #12416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
moonbox3
merged 2 commits into
microsoft:main
from
moonbox3:chat-complete-agent-stream-usage
Jun 9, 2025
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
110 changes: 110 additions & 0 deletions
110
...ples/concepts/agents/chat_completion_agent/chat_completion_agent_streaming_token_usage.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# Copyright (c) Microsoft. All rights reserved. | ||
|
||
import asyncio | ||
from typing import Annotated | ||
|
||
from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread | ||
from semantic_kernel.connectors.ai.completion_usage import CompletionUsage | ||
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion | ||
from semantic_kernel.functions import kernel_function | ||
|
||
""" | ||
The following sample demonstrates how to create a chat completion agent | ||
and use it with streaming responses. It also shows how to track token | ||
usage during the streaming process. | ||
""" | ||
|
||
|
||
# Define a sample plugin for the sample | ||
class MenuPlugin: | ||
"""A sample Menu Plugin used for the concept sample.""" | ||
|
||
@kernel_function(description="Provides a list of specials from the menu.") | ||
def get_specials(self) -> Annotated[str, "Returns the specials from the menu."]: | ||
return """ | ||
Special Soup: Clam Chowder | ||
Special Salad: Cobb Salad | ||
Special Drink: Chai Tea | ||
""" | ||
|
||
@kernel_function(description="Provides the price of the requested menu item.") | ||
def get_item_price( | ||
self, menu_item: Annotated[str, "The name of the menu item."] | ||
) -> Annotated[str, "Returns the price of the menu item."]: | ||
return "$9.99" | ||
|
||
|
||
async def main() -> None: | ||
agent = ChatCompletionAgent( | ||
service=OpenAIChatCompletion(), | ||
name="Assistant", | ||
instructions="Answer questions about the menu.", | ||
plugins=[MenuPlugin()], | ||
) | ||
|
||
# Create a thread for the agent | ||
# If no thread is provided, a new thread will be | ||
# created and returned with the initial response | ||
thread: ChatHistoryAgentThread = None | ||
|
||
user_inputs = [ | ||
"Hello", | ||
"What is the special soup?", | ||
"How much does that cost?", | ||
"Thank you", | ||
] | ||
|
||
completion_usage = CompletionUsage() | ||
|
||
for user_input in user_inputs: | ||
print(f"\n# User: '{user_input}'") | ||
async for response in agent.invoke_stream( | ||
messages=user_input, | ||
thread=thread, | ||
): | ||
if response.content: | ||
print(response.content, end="", flush=True) | ||
if response.metadata.get("usage"): | ||
completion_usage += response.metadata["usage"] | ||
print(f"\nStreaming Usage: {response.metadata['usage']}") | ||
thread = response.thread | ||
print() | ||
|
||
# Print the completion usage | ||
print(f"\nStreaming Total Completion Usage: {completion_usage.model_dump_json(indent=4)}") | ||
|
||
""" | ||
Sample Output: | ||
|
||
# User: 'Hello' | ||
Hello! How can I help you with the menu today? | ||
|
||
# User: 'What is the special soup?' | ||
The special soup today is Clam Chowder. Would you like more details or are you interested in something else from | ||
the menu? | ||
|
||
# User: 'How much does that cost?' | ||
The Clam Chowder special soup costs $9.99. Would you like to add it to your order or ask about something else? | ||
|
||
# User: 'Thank you' | ||
You're welcome! If you have any more questions or need help with the menu, just let me know. Enjoy your meal! | ||
|
||
Streaming Total Completion Usage: { | ||
"prompt_tokens": 1150, | ||
"prompt_tokens_details": { | ||
"audio_tokens": 0, | ||
"cached_tokens": 0 | ||
}, | ||
"completion_tokens": 134, | ||
"completion_tokens_details": { | ||
"accepted_prediction_tokens": 0, | ||
"audio_tokens": 0, | ||
"reasoning_tokens": 0, | ||
"rejected_prediction_tokens": 0 | ||
} | ||
} | ||
""" | ||
|
||
|
||
if __name__ == "__main__": | ||
asyncio.run(main()) |
111 changes: 111 additions & 0 deletions
111
python/samples/concepts/agents/chat_completion_agent/chat_completion_agent_token_usage.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
# Copyright (c) Microsoft. All rights reserved. | ||
|
||
import asyncio | ||
from typing import Annotated | ||
|
||
from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread | ||
from semantic_kernel.connectors.ai.completion_usage import CompletionUsage | ||
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion | ||
from semantic_kernel.functions import kernel_function | ||
|
||
""" | ||
The following sample demonstrates how to create a chat completion agent | ||
and use it with non-streaming responses. It also shows how to track token | ||
usage during agent invoke. | ||
""" | ||
|
||
|
||
# Define a sample plugin for the sample | ||
class MenuPlugin: | ||
"""A sample Menu Plugin used for the concept sample.""" | ||
|
||
@kernel_function(description="Provides a list of specials from the menu.") | ||
def get_specials(self) -> Annotated[str, "Returns the specials from the menu."]: | ||
return """ | ||
Special Soup: Clam Chowder | ||
Special Salad: Cobb Salad | ||
Special Drink: Chai Tea | ||
""" | ||
|
||
@kernel_function(description="Provides the price of the requested menu item.") | ||
def get_item_price( | ||
self, menu_item: Annotated[str, "The name of the menu item."] | ||
) -> Annotated[str, "Returns the price of the menu item."]: | ||
return "$9.99" | ||
|
||
|
||
async def main() -> None: | ||
agent = ChatCompletionAgent( | ||
service=OpenAIChatCompletion(), | ||
name="Assistant", | ||
instructions="Answer questions about the menu.", | ||
plugins=[MenuPlugin()], | ||
) | ||
|
||
# Create a thread for the agent | ||
# If no thread is provided, a new thread will be | ||
# created and returned with the initial response | ||
thread: ChatHistoryAgentThread = None | ||
|
||
user_inputs = [ | ||
"Hello", | ||
"What is the special soup?", | ||
"How much does that cost?", | ||
"Thank you", | ||
] | ||
|
||
completion_usage = CompletionUsage() | ||
|
||
for user_input in user_inputs: | ||
print(f"\n# User: '{user_input}'") | ||
async for response in agent.invoke( | ||
messages=user_input, | ||
thread=thread, | ||
): | ||
if response.content: | ||
print(response.content) | ||
if response.metadata.get("usage"): | ||
completion_usage += response.metadata["usage"] | ||
thread = response.thread | ||
print() | ||
|
||
# Print the completion usage | ||
print(f"\nNon-Streaming Total Completion Usage: {completion_usage.model_dump_json(indent=4)}") | ||
|
||
""" | ||
Sample Output: | ||
|
||
# User: 'Hello' | ||
Hello! How can I help you with the menu today? | ||
|
||
|
||
# User: 'What is the special soup?' | ||
The special soup today is Clam Chowder. Would you like to know more about it or see the other specials? | ||
|
||
|
||
# User: 'How much does that cost?' | ||
The Clam Chowder special costs $9.99. Would you like to add that to your order or need more information? | ||
|
||
|
||
# User: 'Thank you' | ||
You're welcome! If you have any more questions or need help with the menu, just let me know. Enjoy your day! | ||
|
||
Non-Streaming Total Completion Usage: { | ||
"prompt_tokens": 772, | ||
"prompt_tokens_details": { | ||
"audio_tokens": 0, | ||
"cached_tokens": 0 | ||
}, | ||
"completion_tokens": 92, | ||
"completion_tokens_details": { | ||
"accepted_prediction_tokens": 0, | ||
"audio_tokens": 0, | ||
"reasoning_tokens": 0, | ||
"rejected_prediction_tokens": 0 | ||
} | ||
} | ||
""" | ||
|
||
|
||
if __name__ == "__main__": | ||
asyncio.run(main()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,56 @@ | ||
# Copyright (c) Microsoft. All rights reserved. | ||
|
||
|
||
from openai.types import CompletionUsage as OpenAICompletionUsage | ||
from openai.types.completion_usage import CompletionTokensDetails, PromptTokensDetails | ||
|
||
from semantic_kernel.kernel_pydantic import KernelBaseModel | ||
|
||
|
||
class CompletionUsage(KernelBaseModel): | ||
"""Completion usage information.""" | ||
"""A class representing the usage of tokens in a completion request.""" | ||
|
||
prompt_tokens: int | None = None | ||
prompt_tokens_details: PromptTokensDetails | None = None | ||
completion_tokens: int | None = None | ||
completion_tokens_details: CompletionTokensDetails | None = None | ||
|
||
@classmethod | ||
def from_openai(cls, openai_completion_usage: OpenAICompletionUsage): | ||
"""Create a CompletionUsage object from an OpenAI response.""" | ||
"""Create a CompletionUsage instance from an OpenAICompletionUsage instance.""" | ||
return cls( | ||
prompt_tokens=openai_completion_usage.prompt_tokens, | ||
prompt_tokens_details=openai_completion_usage.prompt_tokens_details | ||
if openai_completion_usage.prompt_tokens_details | ||
else None, | ||
completion_tokens=openai_completion_usage.completion_tokens, | ||
completion_tokens_details=openai_completion_usage.completion_tokens_details | ||
if openai_completion_usage.completion_tokens_details | ||
else None, | ||
) | ||
|
||
def __add__(self, other: "CompletionUsage") -> "CompletionUsage": | ||
"""Add two CompletionUsage objects.""" | ||
"""Combine two CompletionUsage instances by summing their token counts.""" | ||
|
||
def _merge_details(cls, a, b): | ||
"""Merge two details objects by summing their fields.""" | ||
if a is None and b is None: | ||
return None | ||
kwargs = {} | ||
for field in cls.__annotations__: | ||
x = getattr(a, field, None) | ||
y = getattr(b, field, None) | ||
value = None if x is None and y is None else (x or 0) + (y or 0) | ||
kwargs[field] = value | ||
return cls(**kwargs) | ||
|
||
return CompletionUsage( | ||
prompt_tokens=(self.prompt_tokens or 0) + (other.prompt_tokens or 0), | ||
completion_tokens=(self.completion_tokens or 0) + (other.completion_tokens or 0), | ||
prompt_tokens_details=_merge_details( | ||
PromptTokensDetails, self.prompt_tokens_details, other.prompt_tokens_details | ||
), | ||
completion_tokens_details=_merge_details( | ||
CompletionTokensDetails, self.completion_tokens_details, other.completion_tokens_details | ||
), | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.