Skip to content

.Net: fix(OpenAI): ContentBuffer is not cleared when using AutoInvoke in Ad… #10911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

MadLongTom
Copy link

fix(OpenAI): ContentBuffer is not cleared when using AutoInvoke in AddStreamingMessageAsync

Motivation and Context

When using the OpenAI connector with AutoInvoke enabled, the content buffer was not being properly cleared between function calls. This caused content from previous responses to be incorrectly included in subsequent streaming responses, leading to duplicated or incorrect content in the chat history.

This issue specifically affects scenarios where:

  1. Streaming is enabled
  2. AutoInvoke is used for function calling
  3. Multiple function calls occur in sequence

The fix ensures that the content buffer is properly cleared when a function call or tool call is completed, preventing content from previous responses from leaking into new responses.

Fixes #10910

Description

The changes address the issue by adding logic to clear the content buffer when a message with a finish reason of FunctionCall, ToolCalls, or Stop is received. This ensures that when AutoInvoke triggers a new completion request, it starts with a clean buffer.

Additionally, I've updated the OpenAIStreamingChatMessageContent creation to properly include the finish reason when yielding the last message, ensuring that the streaming content correctly reflects the completion status of the response.

These changes maintain the existing functionality while fixing the content accumulation issue, providing a more reliable experience when using streaming with function calling in the OpenAI connector.

Contribution Checklist

@MadLongTom MadLongTom requested a review from a team as a code owner March 11, 2025 09:26
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel labels Mar 11, 2025
@github-actions github-actions bot changed the title fix(OpenAI): ContentBuffer is not cleared when using AutoInvoke in Ad… .Net: fix(OpenAI): ContentBuffer is not cleared when using AutoInvoke in Ad… Mar 11, 2025
@MadLongTom
Copy link
Author

@microsoft-github-policy-service agree

@markwallace-microsoft
Copy link
Member

@MadLongTom thanks for the contribution. Do you have any test or samples that demonstrates the issue you are fixing?

@MadLongTom
Copy link
Author

MadLongTom commented Mar 12, 2025

@MadLongTom thanks for the contribution. Do you have any test or samples that demonstrates the issue you are fixing?

@markwallace-microsoft I've written a test sample in https://github.com/MadLongTom/semantic-kernel/tree/sample/dotnet/AutoInvokeTest

@SergeyMenshykh
Copy link
Member

Hi @MadLongTom, thank you for your PR. I tried to reproduce the issue but was unable to do so. While attempting this, I distilled your sample into this relatively simple test. Could you please run it and show the duplicated/incorrect content in the chat history? Alternatively, if I missed something from your sample that makes the issue non-reproducible, could you please fix this test?

[Fact]
public async Task ReproduceContentBufferIssue()
{
    IChatCompletionService chatCompletion = new OpenAIChatCompletionService(
        modelId: TestConfiguration.OpenAI.ChatModelId,
        apiKey: TestConfiguration.OpenAI.ApiKey);

    Kernel kernel = new();
    kernel.ImportPluginFromType<Plugin>();

    ChatHistory chatHistory = new("you are a helpful assistant.");
    chatHistory.AddUserMessage("What is current weather at London?");

    OpenAIPromptExecutionSettings requestSettings = new()
    {
        ExtensionData = new Dictionary<string, object>(),
        MaxTokens = new int?(4096),
        Temperature = new double?(0.6),
        TopP = new double?(0.95),
        ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
    };

    IAsyncEnumerable<StreamingChatMessageContent> results = chatHistory.AddStreamingMessageAsync(
        streamingMessageContents: chatCompletion.GetStreamingChatMessageContentsAsync(chatHistory, requestSettings, kernel).Cast<OpenAIStreamingChatMessageContent>(),
        includeToolCalls: true);

    StringBuilder fullMessage = new();

    await foreach (var completionResult in results)
    {
        fullMessage.Append(completionResult);
    }

    Console.WriteLine(fullMessage.ToString());
}

public class Plugin
{
    [KernelFunction]
    public string GetCurrentTime()
    {
        return DateTime.Now.ToString();
    }

    [KernelFunction]
    public string GetCurrentWeather(string location, string time)
    {
        return $"The weather in {location} at {time} is 72 degrees.";
    }
}

Here is the chat history with no duplicates:
image

@MadLongTom
Copy link
Author

MadLongTom commented Mar 15, 2025

@SergeyMenshykh Thank you for writing the test case. I noticed that you are probably using a model without reasoning capabilities for testing, so there is no "text content body" in the first assistant's function call reply, only function call content. In the bug issue I submitted, I used the QwQ:32b model, which natively supports reasoning and function calling. The reasoning process is written to the assistant text content before each function call, and I've serialized a ChatHistory json of my test results:

[
    {
        "Role": {
            "Label": "system"
        },
        "Items": [
            {
                "$type": "TextContent",
                "Text": "you are a helpful assistant."
            }
        ]
    },
    {
        "Role": {
            "Label": "user"
        },
        "Items": [
            {
                "$type": "TextContent",
                "Text": "what time is it now?"
            }
        ]
    },
    {
        "Role": {
            "Label": "Assistant"
        },
        "Items": [
            {
                "$type": "TextContent",
                "Text": "\u003Cthink\u003E\nOkay, the user is asking, \u0022what time is it now?\u0022 I need to figure out which tool to use here. Let me check the available functions. There\u0027s TimeUtil-GetCurrentTime, which gets the current date and time. The parameters require a callid, but the user didn\u0027t provide one. The description says to pass any value, so maybe I can just use a placeholder like \u0022user_request\u0022 or something. The Memory-GetTextSearchResults is for searching content, which isn\u0027t needed here. So I should call TimeUtil-GetCurrentTime with a callid parameter. Let me make sure the parameters are correct. The function requires callid as a string, so I\u0027ll set it to \u0022user_query\u0022. Alright, that should work.\n\u003C/think\u003E\n\n",
                "ModelId": "QwQ",
                "Metadata": {
                    "CompletionId": "chatcmpl-86e780b4173646ba839e109b8b35248a",
                    "CreatedAt": "2025-03-15T07:35:23\u002B00:00",
                    "SystemFingerprint": null,
                    "RefusalUpdate": null,
                    "Usage": {
                        "OutputTokenCount": 183,
                        "InputTokenCount": 315,
                        "TotalTokenCount": 498,
                        "OutputTokenDetails": null,
                        "InputTokenDetails": null
                    },
                    "FinishReason": null,
                    "ChatResponseMessage.FunctionToolCalls": [
                        {
                            "Kind": 0,
                            "FunctionName": "TimeUtil-GetCurrentTime",
                            "FunctionArguments": "eyJjYWxsaWQiOiAidXNlcl9xdWVyeSJ9",
                            "Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080"
                        }
                    ]
                }
            },
            {
                "$type": "FunctionCallContent",
                "Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080",
                "PluginName": "TimeUtil",
                "FunctionName": "GetCurrentTime",
                "Arguments": {
                    "callid": "user_query"
                }
            }
        ],
        "ModelId": "QwQ",
        "Metadata": {
            "CompletionId": "chatcmpl-86e780b4173646ba839e109b8b35248a",
            "CreatedAt": "2025-03-15T07:35:23\u002B00:00",
            "SystemFingerprint": null,
            "RefusalUpdate": null,
            "Usage": {
                "OutputTokenCount": 183,
                "InputTokenCount": 315,
                "TotalTokenCount": 498,
                "OutputTokenDetails": null,
                "InputTokenDetails": null
            },
            "FinishReason": null,
            "ChatResponseMessage.FunctionToolCalls": [
                {
                    "Kind": 0,
                    "FunctionName": "TimeUtil-GetCurrentTime",
                    "FunctionArguments": "eyJjYWxsaWQiOiAidXNlcl9xdWVyeSJ9",
                    "Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080"
                }
            ]
        }
    },
    {
        "Role": {
            "Label": "tool"
        },
        "Items": [
            {
                "$type": "TextContent",
                "Text": "2025/3/15 15:34:48"
            },
            {
                "$type": "FunctionResultContent",
                "CallId": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080",
                "PluginName": "TimeUtil",
                "FunctionName": "GetCurrentTime",
                "Result": "2025/3/15 15:34:48"
            }
        ]
    },
    {
        "Role": {
            "Label": "Assistant"
        },
        "Items": [
            {
                "$type": "TextContent",
                "Text": "\u003Cthink\u003E\nOkay, the user is asking, \u0022what time is it now?\u0022 I need to figure out which tool to use here. Let me check the available functions. There\u0027s TimeUtil-GetCurrentTime, which gets the current date and time. The parameters require a callid, but the user didn\u0027t provide one. The description says to pass any value, so maybe I can just use a placeholder like \u0022user_request\u0022 or something. The Memory-GetTextSearchResults is for searching content, which isn\u0027t needed here. So I should call TimeUtil-GetCurrentTime with a callid parameter. Let me make sure the parameters are correct. The function requires callid as a string, so I\u0027ll set it to \u0022user_query\u0022. Alright, that should work.\n\u003C/think\u003E\n\n\u003Cthink\u003E\nOkay, the user asked, \u0022what time is it now?\u0022 So I need to figure out how to respond. Let me check the tools available. There\u0027s the TimeUtil-GetCurrentTime function. The user\u0027s initial query was about the current time, so I called that function. The response from the tool was \u00222025/3/15 15:34:48\u0022. \n\nWait, the date format here is year/month/day followed by the time. The user probably wants the time in a more readable format. Let me parse that. The time part is 15:34:48, which is 3:34:48 PM. The date is March 15, 2025. But since the question was about the current time, maybe I should present it as the time first, then the date.\n\nI should also check if the user is in a specific time zone, but the tool didn\u0027t mention that. The function\u0027s description just says it gets the current date and time, so I\u0027ll assume it\u0027s in the correct time zone. \n\nPutting it all together, the response should clearly state the time and date. Let me format it in a friendly way. \u0022The current time is 3:34:48 PM on March 15, 2025.\u0022 That sounds good. I should make sure to mention both the time and the date as provided by the tool. Alright, that should answer the user\u0027s question accurately.\n\u003C/think\u003E\n\nThe current time is **3:34:48 PM on March 15, 2025**.",
                "ModelId": "QwQ",
                "Metadata": {
                    "CompletionId": "chatcmpl-86e780b4173646ba839e109b8b35248a",
                    "CreatedAt": "2025-03-15T07:35:23\u002B00:00",
                    "SystemFingerprint": null,
                    "RefusalUpdate": null,
                    "Usage": null,
                    "FinishReason": null,
                    "ChatResponseMessage.FunctionToolCalls": [
                        {
                            "Kind": 0,
                            "FunctionName": "TimeUtil-GetCurrentTime",
                            "FunctionArguments": "eyJjYWxsaWQiOiAidXNlcl9xdWVyeSJ9",
                            "Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080"
                        }
                    ]
                }
            }
        ],
        "ModelId": "QwQ",
        "Metadata": {
            "CompletionId": "chatcmpl-86e780b4173646ba839e109b8b35248a",
            "CreatedAt": "2025-03-15T07:35:23\u002B00:00",
            "SystemFingerprint": null,
            "RefusalUpdate": null,
            "Usage": null,
            "FinishReason": null,
            "ChatResponseMessage.FunctionToolCalls": [
                {
                    "Kind": 0,
                    "FunctionName": "TimeUtil-GetCurrentTime",
                    "FunctionArguments": "eyJjYWxsaWQiOiAidXNlcl9xdWVyeSJ9",
                    "Id": "chatcmpl-tool-0e23562d158645fa8a7f044278c8d080"
                }
            ]
        }
    }
]

@MadLongTom
Copy link
Author

MadLongTom commented Mar 18, 2025

@markwallace-microsoft I rewrote the relevant logic to pass the unit test. Instead of changing the code in ChatCompletion, I added a check in AddStreamingMessageAsync to determine whether the number of messages in chatHistory has increased, thereby resetting the content buffer.

@RogerBarreto
Copy link
Member

I wonder why are we reseting the contentbuffer, is that done for the non-streaming behavior? I might be missing something here?

@SergeyMenshykh @MadLongTom

@MadLongTom
Copy link
Author

I wonder why are we reseting the contentbuffer, is that done for the non-streaming behavior? I might be missing something here?

@SergeyMenshykh @MadLongTom

@RogerBarreto To reproduce this bug, you need a model that has both reasoning and function calling features. As far as I know, only QwQ has this capability. Before the model calls the function, it writes the thinking result to the text content. When the function returns the result, the result of the first thinking and the result of the function call are written to the chat history at the same time. However, the external AddStreamingMessageAsync does not perceive this change, resulting in the result of the first thinking remaining in the contentBuffer, and written to the second assistant message together with the result of the second chat completion.

@markwallace-microsoft
Copy link
Member

@MadLongTom are you able to reproduce the error with any OpenAI reasoning models?

@MadLongTom
Copy link
Author

@MadLongTom are you able to reproduce the error with any OpenAI reasoning models?

@markwallace-microsoft o3-mini support both reasoning and function calling, but it decides to call function immediately without writing any reasoning content. So I can't reproduce this error with OpenAI model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel Issues or pull requests impacting the core kernel .NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.Net Bug: ContentBuffer is not cleared when using AutoInvoke in AddStreamingMessageAsync (OpenAI Connector)
4 participants