Skip to content

Commit b650238

Browse files
Wh1isperDouweM
andauthored
Let tools return ToolReturn to pass additional content to model, or attach metadata that's not passed to the model (#2060)
Co-authored-by: Douwe Maan <douwe@pydantic.dev>
1 parent 51d1acb commit b650238

File tree

5 files changed

+339
-1
lines changed

5 files changed

+339
-1
lines changed

docs/tools.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,66 @@ _(This example is complete, it can be run "as is")_
293293

294294
Some models (e.g. Gemini) natively support semi-structured return values, while some expect text (OpenAI) but seem to be just as good at extracting meaning from the data. If a Python object is returned and the model expects a string, the value will be serialized to JSON.
295295

296+
### Advanced Tool Returns
297+
298+
For scenarios where you need more control over both the tool's return value and the content sent to the model, you can use [`ToolReturn`][pydantic_ai.messages.ToolReturn]. This is particularly useful when you want to:
299+
300+
- Provide rich multi-modal content (images, documents, etc.) to the model as context
301+
- Separate the programmatic return value from the model's context
302+
- Include additional metadata that shouldn't be sent to the LLM
303+
304+
Here's an example of a computer automation tool that captures screenshots and provides visual feedback:
305+
306+
```python {title="advanced_tool_return.py" test="skip" lint="skip"}
307+
import time
308+
from pydantic_ai import Agent
309+
from pydantic_ai.messages import ToolReturn, BinaryContent
310+
311+
agent = Agent('openai:gpt-4o')
312+
313+
@agent.tool_plain
314+
def click_and_capture(x: int, y: int) -> ToolReturn:
315+
"""Click at coordinates and show before/after screenshots."""
316+
# Take screenshot before action
317+
before_screenshot = capture_screen()
318+
319+
# Perform click operation
320+
perform_click(x, y)
321+
time.sleep(0.5) # Wait for UI to update
322+
323+
# Take screenshot after action
324+
after_screenshot = capture_screen()
325+
326+
return ToolReturn(
327+
return_value=f"Successfully clicked at ({x}, {y})",
328+
content=[
329+
f"Clicked at coordinates ({x}, {y}). Here's the comparison:",
330+
"Before:",
331+
BinaryContent(data=before_screenshot, media_type="image/png"),
332+
"After:",
333+
BinaryContent(data=after_screenshot, media_type="image/png"),
334+
"Please analyze the changes and suggest next steps."
335+
],
336+
metadata={
337+
"coordinates": {"x": x, "y": y},
338+
"action_type": "click_and_capture",
339+
"timestamp": time.time()
340+
}
341+
)
342+
343+
# The model receives the rich visual content for analysis
344+
# while your application can access the structured return_value and metadata
345+
result = agent.run_sync("Click on the submit button and tell me what happened")
346+
print(result.output)
347+
# The model can analyze the screenshots and provide detailed feedback
348+
```
349+
350+
- **`return_value`**: The actual return value used in the tool response. This is what gets serialized and sent back to the model as the tool's result.
351+
- **`content`**: A sequence of content (text, images, documents, etc.) that provides additional context to the model. This appears as a separate user message.
352+
- **`metadata`**: Optional metadata that your application can access but is not sent to the LLM. Useful for logging, debugging, or additional processing. Some other AI frameworks call this feature "artifacts".
353+
354+
This separation allows you to provide rich context to the model while maintaining clean, structured return values for your application logic.
355+
296356
## Function Tools vs. Structured Outputs
297357

298358
As the name suggests, function tools use the model's "tools" or "functions" API to let the model know what is available to call. Tools or functions are also used to define the schema(s) for structured responses, thus a model might have access to many tools, some of which call function tools while others end the run and produce a final output.

pydantic_ai_slim/pydantic_ai/_agent_graph.py

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -743,6 +743,30 @@ async def process_function_tools( # noqa C901
743743
if isinstance(result, _messages.RetryPromptPart):
744744
results_by_index[index] = result
745745
elif isinstance(result, _messages.ToolReturnPart):
746+
if isinstance(result.content, _messages.ToolReturn):
747+
tool_return = result.content
748+
if (
749+
isinstance(tool_return.return_value, _messages.MultiModalContentTypes)
750+
or isinstance(tool_return.return_value, list)
751+
and any(
752+
isinstance(content, _messages.MultiModalContentTypes)
753+
for content in tool_return.return_value # type: ignore
754+
)
755+
):
756+
raise exceptions.UserError(
757+
f"{result.tool_name}'s `return_value` contains invalid nested MultiModalContentTypes objects. "
758+
f'Please use `content` instead.'
759+
)
760+
result.content = tool_return.return_value # type: ignore
761+
result.metadata = tool_return.metadata
762+
if tool_return.content:
763+
user_parts.append(
764+
_messages.UserPromptPart(
765+
content=list(tool_return.content),
766+
timestamp=result.timestamp,
767+
part_kind='user-prompt',
768+
)
769+
)
746770
contents: list[Any]
747771
single_content: bool
748772
if isinstance(result.content, list):
@@ -754,7 +778,13 @@ async def process_function_tools( # noqa C901
754778

755779
processed_contents: list[Any] = []
756780
for content in contents:
757-
if isinstance(content, _messages.MultiModalContentTypes):
781+
if isinstance(content, _messages.ToolReturn):
782+
raise exceptions.UserError(
783+
f"{result.tool_name}'s return contains invalid nested ToolReturn objects. "
784+
f'ToolReturn should be used directly.'
785+
)
786+
elif isinstance(content, _messages.MultiModalContentTypes):
787+
# Handle direct multimodal content
758788
if isinstance(content, _messages.BinaryContent):
759789
identifier = multi_modal_content_identifier(content.data)
760790
else:
@@ -769,6 +799,7 @@ async def process_function_tools( # noqa C901
769799
)
770800
processed_contents.append(f'See file {identifier}')
771801
else:
802+
# Handle regular content
772803
processed_contents.append(content)
773804

774805
if single_content:

pydantic_ai_slim/pydantic_ai/messages.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,29 @@ def format(self) -> str:
306306

307307
UserContent: TypeAlias = 'str | ImageUrl | AudioUrl | DocumentUrl | VideoUrl | BinaryContent'
308308

309+
310+
@dataclass(repr=False)
311+
class ToolReturn:
312+
"""A structured return value for tools that need to provide both a return value and custom content to the model.
313+
314+
This class allows tools to return complex responses that include:
315+
- A return value for actual tool return
316+
- Custom content (including multi-modal content) to be sent to the model as a UserPromptPart
317+
- Optional metadata for application use
318+
"""
319+
320+
return_value: Any
321+
"""The return value to be used in the tool response."""
322+
323+
content: Sequence[UserContent] | None = None
324+
"""The content sequence to be sent to the model as a UserPromptPart."""
325+
326+
metadata: Any = None
327+
"""Additional data that can be accessed programmatically by the application but is not sent to the LLM."""
328+
329+
__repr__ = _utils.dataclasses_no_defaults_repr
330+
331+
309332
# Ideally this would be a Union of types, but Python 3.9 requires it to be a string, and strings don't work with `isinstance``.
310333
MultiModalContentTypes = (ImageUrl, AudioUrl, DocumentUrl, VideoUrl, BinaryContent)
311334
_document_format_lookup: dict[str, DocumentFormat] = {
@@ -396,6 +419,9 @@ class ToolReturnPart:
396419
tool_call_id: str
397420
"""The tool call identifier, this is used by some models including OpenAI."""
398421

422+
metadata: Any = None
423+
"""Additional data that can be accessed programmatically by the application but is not sent to the LLM."""
424+
399425
timestamp: datetime = field(default_factory=_now_utc)
400426
"""The timestamp, when the tool returned."""
401427

tests/models/test_model_function.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,7 @@ def test_var_args():
231231
'tool_name': 'get_var_args',
232232
'content': '{"args": [1, 2, 3]}',
233233
'tool_call_id': IsStr(),
234+
'metadata': None,
234235
'timestamp': IsStr() & IsNow(iso_string=True, tz=timezone.utc), # type: ignore[reportUnknownMemberType]
235236
'part_kind': 'tool-return',
236237
}

0 commit comments

Comments
 (0)