Skip to content

.Net: Implement OnnxRuntimeGenAIChatCompletionService on OnnxRuntimeGenAIChatClient #12197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 10, 2025

Conversation

stephentoub
Copy link
Member

No description provided.

@stephentoub stephentoub requested a review from a team as a code owner May 20, 2025 14:19
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel labels May 20, 2025
@github-actions github-actions bot changed the title Implement OnnxRuntimeGenAIChatCompletionService on OnnxRuntimeGenAIChatClient .Net: Implement OnnxRuntimeGenAIChatCompletionService on OnnxRuntimeGenAIChatClient May 20, 2025
@RogerBarreto
Copy link
Member

The model seems to be loading in the initialization of the client, should happen just in the runtime.

    public OnnxRuntimeGenAIChatClient(string modelPath, OnnxRuntimeGenAIChatClientOptions? options = null)
    {
        //...
        _model = new Model(modelPath);
        _tokenizer = new Tokenizer(_model);
    }

@stephentoub
Copy link
Member Author

The model seems to be loading in the initialization of the client, should happen just in the runtime.

We can, but, why do we want to do that? Any config failures won't be noticed until use, additional code (not present in the current impl) is necessary to prevent concurrent usage from loading the likely multi-gb model multiple times, and first use will be delayed by a potentially very long time, likely timing out.

@RogerBarreto
Copy link
Member

Their 0.8.0 still rely on the 9.4 preview. Getting Method not found in Integration tests.

image
image

@RogerBarreto
Copy link
Member

RogerBarreto commented May 20, 2025

We can, but, why do we want to do that?

Don't want to add behavioral changes to the IChatCompletionService that customers may already be relying into.

Any config failures won't be noticed until use, additional code (not present in the current impl) is necessary to prevent concurrent usage from loading the likely multi-gb model multiple times.

Currently the UnitTests are failing because of loading the model, I would agree that a fail fast should happen if the file do not exists, but not by loading the model.

Normally for local model usage what we see for instance using Ollama, the model gets loaded during the request time, which is how local model applications have been constructed ultimately.

I would also consider for this Early scenario, having the IChatCompletionService(Model) using the ChatClient(model) ctor.

@RogerBarreto
Copy link
Member

RogerBarreto commented May 20, 2025

Adding the delaying on the Service implementation side, so it don't necessarily requires a change the original OnnxChatClient impl.

@stephentoub
Copy link
Member Author

Their 0.8.0 still rely on the 9.4 preview. Getting Method not found in Integration tests.

image image

Ugh, I thought 0.8.0 included the update to the stable dependency. We'll need to wait.

@stephentoub
Copy link
Member Author

Updated to 0.8.1

@stephentoub stephentoub added this pull request to the merge queue Jun 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025
@RogerBarreto RogerBarreto added this pull request to the merge queue Jun 10, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 10, 2025
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Jun 10, 2025

One unrelated integration test failed

[xUnit.net 00:03:34.59]     SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAIChatCompletionNonStreamingTests.ChatCompletionWithWebSearchAsync [FAIL]
[xUnit.net 00:03:34.59]       Assert.NotEmpty() Failure: Collection was empty
[xUnit.net 00:03:34.59]       Stack Trace:
[xUnit.net 00:03:34.59]         /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/IntegrationTests/Connectors/OpenAI/OpenAIChatCompletion_NonStreamingTests.cs(162,0): at SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAIChatCompletionNonStreamingTests.ChatCompletionWithWebSearchAsync()
[xUnit.net 00:03:34.59]         --- End of stack trace from previous location ---

@markwallace-microsoft markwallace-microsoft added this pull request to the merge queue Jun 10, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 10, 2025
@markwallace-microsoft
Copy link
Member

More unrelated integration test failures

[xUnit.net 00:01:22.94]     SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAIChatCompletionNonStreamingTests.ChatCompletionWithAudioInputAndOutputAsync [FAIL]
[xUnit.net 00:01:22.95]       Microsoft.SemanticKernel.HttpOperationException : Service request failed.
[xUnit.net 00:01:22.95]       Status: 503 (Service Unavailable)
[xUnit.net 00:01:22.95]       
[xUnit.net 00:01:22.95]       ---- System.ClientModel.ClientResultException : Service request failed.
[xUnit.net 00:01:22.95]       Status: 503 (Service Unavailable)
[xUnit.net 00:01:22.95]       
[xUnit.net 00:01:22.95]       Stack Trace:
[xUnit.net 00:01:22.95]         /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/Connectors/Connectors.OpenAI/Core/ClientCore.cs(244,0): at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.RunRequestAsync[T](Func`1 request)
[xUnit.net 00:01:22.95]         /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/Connectors/Connectors.OpenAI/Core/ClientCore.ChatCompletion.cs(171,0): at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.GetChatMessageContentsAsync(String targetModel, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)
[xUnit.net 00:01:22.95]         /home/runner/work/semantic-kernel/semantic-kernel/dotnet/src/SemanticKernel.Abstractions/AI/ChatCompletion/ChatCompletionServiceExtensions.cs(83,0): at Microsoft.SemanticKernel.ChatCompletion.ChatCompletionServiceExtensions.GetChatMessageContentAsync(IChatCompletionService chatCompletionService, ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)


[xUnit.net 00:04:05.05]     SemanticKernel.IntegrationTests.Connectors.OpenAI.OpenAITextToAudioTests.OpenAITextToAudioTestAsync [FAIL]
[xUnit.net 00:04:05.05]       System.Threading.Tasks.TaskCanceledException : The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
[xUnit.net 00:04:05.05]       ---- System.TimeoutException : The operation was canceled.
[xUnit.net 00:04:05.05]       -------- System.Threading.Tasks.TaskCanceledException : The operation was canceled.
[xUnit.net 00:04:05.05]       ------------ System.IO.IOException : Unable to read data from the transport connection: Operation canceled.
[xUnit.net 00:04:05.05]       ---------------- System.Net.Sockets.SocketException : Operation canceled

@markwallace-microsoft markwallace-microsoft added this pull request to the merge queue Jun 10, 2025
Merged via the queue into microsoft:main with commit d2120d4 Jun 10, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel Issues or pull requests impacting the core kernel .NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants