-
Notifications
You must be signed in to change notification settings - Fork 4k
.Net: fix(OpenAI): ContentBuffer is not cleared when using AutoInvoke in Ad… #10911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…dStreamingMessageAsync
@microsoft-github-policy-service agree |
@MadLongTom thanks for the contribution. Do you have any test or samples that demonstrates the issue you are fixing? |
@markwallace-microsoft I've written a test sample in https://github.com/MadLongTom/semantic-kernel/tree/sample/dotnet/AutoInvokeTest |
Hi @MadLongTom, thank you for your PR. I tried to reproduce the issue but was unable to do so. While attempting this, I distilled your sample into this relatively simple test. Could you please run it and show the duplicated/incorrect content in the chat history? Alternatively, if I missed something from your sample that makes the issue non-reproducible, could you please fix this test? [Fact]
public async Task ReproduceContentBufferIssue()
{
IChatCompletionService chatCompletion = new OpenAIChatCompletionService(
modelId: TestConfiguration.OpenAI.ChatModelId,
apiKey: TestConfiguration.OpenAI.ApiKey);
Kernel kernel = new();
kernel.ImportPluginFromType<Plugin>();
ChatHistory chatHistory = new("you are a helpful assistant.");
chatHistory.AddUserMessage("What is current weather at London?");
OpenAIPromptExecutionSettings requestSettings = new()
{
ExtensionData = new Dictionary<string, object>(),
MaxTokens = new int?(4096),
Temperature = new double?(0.6),
TopP = new double?(0.95),
ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
};
IAsyncEnumerable<StreamingChatMessageContent> results = chatHistory.AddStreamingMessageAsync(
streamingMessageContents: chatCompletion.GetStreamingChatMessageContentsAsync(chatHistory, requestSettings, kernel).Cast<OpenAIStreamingChatMessageContent>(),
includeToolCalls: true);
StringBuilder fullMessage = new();
await foreach (var completionResult in results)
{
fullMessage.Append(completionResult);
}
Console.WriteLine(fullMessage.ToString());
}
public class Plugin
{
[KernelFunction]
public string GetCurrentTime()
{
return DateTime.Now.ToString();
}
[KernelFunction]
public string GetCurrentWeather(string location, string time)
{
return $"The weather in {location} at {time} is 72 degrees.";
}
} |
@SergeyMenshykh Thank you for writing the test case. I noticed that you are probably using a model without reasoning capabilities for testing, so there is no "text content body" in the first assistant's function call reply, only function call content. In the bug issue I submitted, I used the QwQ:32b model, which natively supports reasoning and function calling. The reasoning process is written to the assistant text content before each function call, and I've serialized a ChatHistory json of my test results: [
{
"Role": {
"Label": "system"
},
"Items": [
{
"$type": "TextContent",
"Text": "you are a helpful assistant."
}
]
},
{
"Role": {
"Label": "user"
},
"Items": [
{
"$type": "TextContent",
"Text": "what time is it now?"
}
]
},
{
"Role": {
"Label": "Assistant"
},
"Items": [
{
"$type": "TextContent",
"Text": "\u003Cthink\u003E\nOkay, the user is asking, \u0022what time is it now?\u0022 I need to figure out which tool to use here. Let me check the available functions. There\u0027s TimeUtil-GetCurrentTime, which gets the current date and time. The parameters require a callid, but the user didn\u0027t provide one. The description says to pass any value, so maybe I can just use a placeholder like \u0022user_request\u0022 or something. The Memory-GetTextSearchResults is for searching content, which isn\u0027t needed here. So I should call TimeUtil-GetCurrentTime with a callid parameter. Let me make sure the parameters are correct. The function requires callid as a string, so I\u0027ll set it to \u0022user_query\u0022. Alright, that should work.\n\u003C/think\u003E\n\n",
"ModelId": "QwQ",
"Metadata": {
"CompletionId": "chatcmpl-86e780b4173646ba839e109b8b35248a",
"CreatedAt": "2025-03-15T07:35:23\u002B00:00",
"SystemFingerprint": null,
"RefusalUpdate": null,
"Usage": {
"OutputTokenCount": 183,
"InputTokenCount": 315,
"TotalTokenCount": 498,
"OutputTokenDetails": null,
"InputTokenDetails": null
},
"FinishReason": null,
"ChatResponseMessage.FunctionToolCalls": [
{
"Kind": 0,
"FunctionName": "TimeUtil-GetCurrentTime",
"FunctionArguments": "eyJjYWxsaWQiOiAidXNlcl9xdWVyeSJ9",
"Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080"
}
]
}
},
{
"$type": "FunctionCallContent",
"Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080",
"PluginName": "TimeUtil",
"FunctionName": "GetCurrentTime",
"Arguments": {
"callid": "user_query"
}
}
],
"ModelId": "QwQ",
"Metadata": {
"CompletionId": "chatcmpl-86e780b4173646ba839e109b8b35248a",
"CreatedAt": "2025-03-15T07:35:23\u002B00:00",
"SystemFingerprint": null,
"RefusalUpdate": null,
"Usage": {
"OutputTokenCount": 183,
"InputTokenCount": 315,
"TotalTokenCount": 498,
"OutputTokenDetails": null,
"InputTokenDetails": null
},
"FinishReason": null,
"ChatResponseMessage.FunctionToolCalls": [
{
"Kind": 0,
"FunctionName": "TimeUtil-GetCurrentTime",
"FunctionArguments": "eyJjYWxsaWQiOiAidXNlcl9xdWVyeSJ9",
"Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080"
}
]
}
},
{
"Role": {
"Label": "tool"
},
"Items": [
{
"$type": "TextContent",
"Text": "2025/3/15 15:34:48"
},
{
"$type": "FunctionResultContent",
"CallId": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080",
"PluginName": "TimeUtil",
"FunctionName": "GetCurrentTime",
"Result": "2025/3/15 15:34:48"
}
]
},
{
"Role": {
"Label": "Assistant"
},
"Items": [
{
"$type": "TextContent",
"Text": "\u003Cthink\u003E\nOkay, the user is asking, \u0022what time is it now?\u0022 I need to figure out which tool to use here. Let me check the available functions. There\u0027s TimeUtil-GetCurrentTime, which gets the current date and time. The parameters require a callid, but the user didn\u0027t provide one. The description says to pass any value, so maybe I can just use a placeholder like \u0022user_request\u0022 or something. The Memory-GetTextSearchResults is for searching content, which isn\u0027t needed here. So I should call TimeUtil-GetCurrentTime with a callid parameter. Let me make sure the parameters are correct. The function requires callid as a string, so I\u0027ll set it to \u0022user_query\u0022. Alright, that should work.\n\u003C/think\u003E\n\n\u003Cthink\u003E\nOkay, the user asked, \u0022what time is it now?\u0022 So I need to figure out how to respond. Let me check the tools available. There\u0027s the TimeUtil-GetCurrentTime function. The user\u0027s initial query was about the current time, so I called that function. The response from the tool was \u00222025/3/15 15:34:48\u0022. \n\nWait, the date format here is year/month/day followed by the time. The user probably wants the time in a more readable format. Let me parse that. The time part is 15:34:48, which is 3:34:48 PM. The date is March 15, 2025. But since the question was about the current time, maybe I should present it as the time first, then the date.\n\nI should also check if the user is in a specific time zone, but the tool didn\u0027t mention that. The function\u0027s description just says it gets the current date and time, so I\u0027ll assume it\u0027s in the correct time zone. \n\nPutting it all together, the response should clearly state the time and date. Let me format it in a friendly way. \u0022The current time is 3:34:48 PM on March 15, 2025.\u0022 That sounds good. I should make sure to mention both the time and the date as provided by the tool. Alright, that should answer the user\u0027s question accurately.\n\u003C/think\u003E\n\nThe current time is **3:34:48 PM on March 15, 2025**.",
"ModelId": "QwQ",
"Metadata": {
"CompletionId": "chatcmpl-86e780b4173646ba839e109b8b35248a",
"CreatedAt": "2025-03-15T07:35:23\u002B00:00",
"SystemFingerprint": null,
"RefusalUpdate": null,
"Usage": null,
"FinishReason": null,
"ChatResponseMessage.FunctionToolCalls": [
{
"Kind": 0,
"FunctionName": "TimeUtil-GetCurrentTime",
"FunctionArguments": "eyJjYWxsaWQiOiAidXNlcl9xdWVyeSJ9",
"Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080"
}
]
}
}
],
"ModelId": "QwQ",
"Metadata": {
"CompletionId": "chatcmpl-86e780b4173646ba839e109b8b35248a",
"CreatedAt": "2025-03-15T07:35:23\u002B00:00",
"SystemFingerprint": null,
"RefusalUpdate": null,
"Usage": null,
"FinishReason": null,
"ChatResponseMessage.FunctionToolCalls": [
{
"Kind": 0,
"FunctionName": "TimeUtil-GetCurrentTime",
"FunctionArguments": "eyJjYWxsaWQiOiAidXNlcl9xdWVyeSJ9",
"Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080"
}
]
}
}
] |
…Tom/semantic-kernel into fix-clear-content-buffer
@markwallace-microsoft I rewrote the relevant logic to pass the unit test. Instead of changing the code in ChatCompletion, I added a check in AddStreamingMessageAsync to determine whether the number of messages in chatHistory has increased, thereby resetting the content buffer. |
I wonder why are we reseting the contentbuffer, is that done for the non-streaming behavior? I might be missing something here? |
@RogerBarreto To reproduce this bug, you need a model that has both reasoning and function calling features. As far as I know, only QwQ has this capability. Before the model calls the function, it writes the thinking result to the text content. When the function returns the result, the result of the first thinking and the result of the function call are written to the chat history at the same time. However, the external AddStreamingMessageAsync does not perceive this change, resulting in the result of the first thinking remaining in the contentBuffer, and written to the second assistant message together with the result of the second chat completion. |
@MadLongTom are you able to reproduce the error with any OpenAI reasoning models? |
@markwallace-microsoft |
fix(OpenAI): ContentBuffer is not cleared when using AutoInvoke in AddStreamingMessageAsync
Motivation and Context
When using the OpenAI connector with AutoInvoke enabled, the content buffer was not being properly cleared between function calls. This caused content from previous responses to be incorrectly included in subsequent streaming responses, leading to duplicated or incorrect content in the chat history.
This issue specifically affects scenarios where:
The fix ensures that the content buffer is properly cleared when a function call or tool call is completed, preventing content from previous responses from leaking into new responses.
Fixes #10910
Description
The changes address the issue by adding logic to clear the content buffer when a message with a finish reason of
FunctionCall
,ToolCalls
, orStop
is received. This ensures that when AutoInvoke triggers a new completion request, it starts with a clean buffer.Additionally, I've updated the
OpenAIStreamingChatMessageContent
creation to properly include the finish reason when yielding the last message, ensuring that the streaming content correctly reflects the completion status of the response.These changes maintain the existing functionality while fixing the content accumulation issue, providing a more reliable experience when using streaming with function calling in the OpenAI connector.
Contribution Checklist