-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Labels
bugSomething isn't workingSomething isn't workingpythonPull requests for the Python Semantic KernelPull requests for the Python Semantic Kernel
Description
Describe the bug
('Failed Inference with ONNX', AttributeError("'onnxruntime_genai.onnxruntime_genai.GeneratorParams' object has no attribute 'input_ids'")).
To Reproduce
Steps to reproduce the behavior:
Just try to run the sample code given downloading the Phi-4-mini-instruct-onnx model from HF
Platform
- Language: Python
- Source: pip package version 1.35.3
- AI model: Phi-4-mini-instruct-onnx
- IDE: VS Code
- OS: Windows
Additional context
Code trying to run.
import asyncio
import os
from semantic_kernel.connectors.ai.onnx import OnnxGenAIChatCompletion, OnnxGenAIPromptExecutionSettings
from semantic_kernel.contents.chat_history import ChatHistory
from semantic_kernel.kernel import Kernel
kernel = Kernel()
service_id = "phi3"
streaming = True
model_path = os.path.join(
os.path.dirname(__file__),
"models", "Phi-4-mini-instruct-onnx", "cpu_and_mobile", "cpu-int4-rtn-block-32-acc-level-4"
)
chat_completion = OnnxGenAIChatCompletion(ai_model_path=model_path,ai_model_id=service_id, template="phi3")
chat_completion.enable_multi_modality = False
settings = OnnxGenAIPromptExecutionSettings()
system_message = """You are a helpful assistant."""
chat_history = ChatHistory(system_message=system_message)
async def chat() -> bool:
try:
user_input = input("User:> ")
except KeyboardInterrupt:
print("\n\nExiting chat...")
return False
except EOFError:
print("\n\nExiting chat...")
return False
if user_input == "exit":
print("\n\nExiting chat...")
return False
chat_history.add_user_message(user_input)
if streaming:
print("Mosscap:> ", end="")
message = ""
async for chunk in chat_completion.get_streaming_chat_message_content(
chat_history=chat_history, settings=settings, kernel=kernel
):
if chunk:
print(str(chunk), end="")
message += str(chunk)
chat_history.add_assistant_message(message)
print("")
else:
answer = await chat_completion.get_chat_message_content(
chat_history=chat_history, settings=settings, kernel=kernel
)
print(f"Mosscap:> {answer}")
chat_history.add_message(answer)
return True
async def main() -> None:
chatting = True
while chatting:
chatting = await chat()
if __name__ == "__main__":
asyncio.run(main())
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingpythonPull requests for the Python Semantic KernelPull requests for the Python Semantic Kernel
Type
Projects
Status
No status