-
Notifications
You must be signed in to change notification settings - Fork 4k
.Net: Implement OnnxRuntimeGenAIChatCompletionService on OnnxRuntimeGenAIChatClient #12197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net: Implement OnnxRuntimeGenAIChatCompletionService on OnnxRuntimeGenAIChatClient #12197
Conversation
The model seems to be loading in the initialization of the client, should happen just in the runtime. public OnnxRuntimeGenAIChatClient(string modelPath, OnnxRuntimeGenAIChatClientOptions? options = null)
{
//...
_model = new Model(modelPath);
_tokenizer = new Tokenizer(_model);
} |
We can, but, why do we want to do that? Any config failures won't be noticed until use, additional code (not present in the current impl) is necessary to prevent concurrent usage from loading the likely multi-gb model multiple times, and first use will be delayed by a potentially very long time, likely timing out. |
Don't want to add behavioral changes to the
Currently the UnitTests are failing because of loading the model, I would agree that a fail fast should happen if the file do not exists, but not by loading the model. Normally for local model usage what we see for instance using Ollama, the model gets loaded during the request time, which is how local model applications have been constructed ultimately. I would also consider for this |
Adding the delaying on the Service implementation side, so it don't necessarily requires a change the original |
c20bb04
to
f76bc1b
Compare
Updated to 0.8.1 |
One unrelated integration test failed
|
More unrelated integration test failures
|
No description provided.