-
Notifications
You must be signed in to change notification settings - Fork 133
💥 Expose agent testing utils #1164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
donald-pinckney
wants to merge
12
commits into
main
Choose a base branch
from
d/20251015-125526
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
28d9283
Expose some agent testing utils to users
donald-pinckney ab762c4
rename test -> testing
donald-pinckney ed44c49
update import sites
donald-pinckney 30b6fbc
fix lints
donald-pinckney 73774a0
cleanup diff
donald-pinckney 7a9a8d1
fmt
donald-pinckney 569fc64
Add experimental warnings to docs
donald-pinckney d249066
Change to factory method
donald-pinckney 3f0908d
fmt
donald-pinckney b72dd77
cleanup
donald-pinckney 296cc3c
fmt
donald-pinckney d3010d2
lints
donald-pinckney File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
"""Testing utilities for OpenAI agents.""" | ||
|
||
from typing import AsyncIterator, Callable, Optional, Union | ||
|
||
from agents import ( | ||
AgentOutputSchemaBase, | ||
Handoff, | ||
Model, | ||
ModelProvider, | ||
ModelResponse, | ||
ModelSettings, | ||
ModelTracing, | ||
Tool, | ||
TResponseInputItem, | ||
Usage, | ||
) | ||
from agents.items import TResponseOutputItem, TResponseStreamEvent | ||
from openai.types.responses import ( | ||
ResponseFunctionToolCall, | ||
ResponseOutputMessage, | ||
ResponseOutputText, | ||
) | ||
|
||
|
||
class ResponseBuilders: | ||
"""Builders for creating model responses for testing. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
|
||
@staticmethod | ||
def model_response(output: TResponseOutputItem) -> ModelResponse: | ||
"""Create a ModelResponse with the given output. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
return ModelResponse( | ||
output=[output], | ||
usage=Usage(), | ||
response_id=None, | ||
) | ||
|
||
@staticmethod | ||
def response_output_message(text: str) -> ResponseOutputMessage: | ||
"""Create a ResponseOutputMessage with text content. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
return ResponseOutputMessage( | ||
id="", | ||
content=[ | ||
ResponseOutputText( | ||
text=text, | ||
annotations=[], | ||
type="output_text", | ||
) | ||
], | ||
role="assistant", | ||
status="completed", | ||
type="message", | ||
) | ||
|
||
@staticmethod | ||
def tool_call(arguments: str, name: str) -> ModelResponse: | ||
"""Create a ModelResponse with a function tool call. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
return ResponseBuilders.model_response( | ||
ResponseFunctionToolCall( | ||
arguments=arguments, | ||
call_id="call", | ||
name=name, | ||
type="function_call", | ||
id="id", | ||
status="completed", | ||
) | ||
) | ||
|
||
@staticmethod | ||
def output_message(text: str) -> ModelResponse: | ||
"""Create a ModelResponse with an output message. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
return ResponseBuilders.model_response( | ||
ResponseBuilders.response_output_message(text) | ||
) | ||
|
||
|
||
class TestModelProvider(ModelProvider): | ||
"""Test model provider which simply returns the given module. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
|
||
__test__ = False | ||
|
||
def __init__(self, model: Model): | ||
"""Initialize a test model provider with a model. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
self._model = model | ||
|
||
def get_model(self, model_name: Union[str, None]) -> Model: | ||
"""Get a model from the model provider. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
return self._model | ||
|
||
|
||
class TestModel(Model): | ||
"""Test model for use mocking model responses. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
|
||
__test__ = False | ||
|
||
def __init__(self, fn: Callable[[], ModelResponse]) -> None: | ||
"""Initialize a test model with a callable. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
self.fn = fn | ||
|
||
async def get_response( | ||
self, | ||
system_instructions: Union[str, None], | ||
input: Union[str, list[TResponseInputItem]], | ||
model_settings: ModelSettings, | ||
tools: list[Tool], | ||
output_schema: Union[AgentOutputSchemaBase, None], | ||
handoffs: list[Handoff], | ||
tracing: ModelTracing, | ||
**kwargs, | ||
) -> ModelResponse: | ||
"""Get a response from the mocked model, by calling the callable passed to the constructor.""" | ||
return self.fn() | ||
|
||
def stream_response( | ||
self, | ||
system_instructions: Optional[str], | ||
input: Union[str, list[TResponseInputItem]], | ||
model_settings: ModelSettings, | ||
tools: list[Tool], | ||
output_schema: Optional[AgentOutputSchemaBase], | ||
handoffs: list[Handoff], | ||
tracing: ModelTracing, | ||
**kwargs, | ||
) -> AsyncIterator[TResponseStreamEvent]: | ||
"""Get a streamed response from the model. Unimplemented.""" | ||
raise NotImplementedError() | ||
|
||
@staticmethod | ||
def returning_responses(responses: list[ModelResponse]) -> "TestModel": | ||
"""Create a mock model which sequentially returns responses from a list. | ||
|
||
.. warning:: | ||
This API is experimental and may change in the future. | ||
""" | ||
i = iter(responses) | ||
return TestModel(lambda: next(i)) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit hard to see from this PR how this looks from a user POV. One reason we did "ActivityEnvironment" and "WorkflowEnvironment" instead of only the building blocks is because users like the nice simplicity of one-liners and reusable constructs. I'm wondering if there's an opportunity to design something here. If not too much trouble, can I see what
tests/openai_agents/basic/test_hello_world_workflow.py
will look like using these utilities?Part of me wonders if we can have an
AgentEnvironment
that basically accepts everything the plugin accepts and also some of this mock stuff. So maybe something like:Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently (with the change to static factory method I just pushed), that test would look like:
client
is a fixture that depends on thetest_model
fixture, so you can override thetest_model
fixture per test or per module.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for most users this is missing the client and plugin configuration which I think we should make easy for testers too. I think to show the full code to compare, you'd have to include your other fixtures like client configuration and plugin creation. Those fixtures are a little pytest specific and external to the test and not really have we have done test helpers in the past. I guess I was thinking something you could easily configure inside your test for each test (but still share if you want). Basically you need an easy way to configure an existing client with the plugin and model stuff.