Skip to content

Conversation

@randombet
Copy link
Collaborator

@randombet randombet commented Oct 14, 2025

Why are these changes needed?

OAI Response API and new Gemini client support list[dict] in their output content field. See below for details.

OAI Response API output is a list of ResponseOutputItem. E.g. output of an image generation could be a list of two items, one text output and one encoded image string.
To support use cases like this, I would like to make AG2 message to messages.

Similarly, the content of OAI response API input (https://github.com/openai/openai-python/blob/main/src/openai/types/responses/response_input_item.py#L42 and https://github.com/openai/openai-python/blob/main/src/openai/types/responses/easy_input_message.py#L13) is a ResponseInputMessageContentList

Gemini Chat also support sending a list of messages https://github.com/googleapis/python-genai/blob/main/google/genai/chats.py#L227

Enable list[dict] type for message['content']
Add multimodal testing with list[dict] type for message['content'] for two-agent chat and groupchat with responses API.

Related issue number

Replace PR #2134

Checks

LLM test with
https://github.com/ag2ai/ag2/actions/runs/18505526861/job/52733291499

Skip LLM test with
https://github.com/ag2ai/ag2/actions/runs/18439791453

…tool_executor.py:149-157)

  - Issue: tool_message variable was not initialized before the loop, causing UnboundLocalError when no tool calls were present
  - Fix: Added tool_message = None initialization and improved condition checking with message.get('tool_calls') instead of 'tool_calls' in message
  - Marker-based filtering system in test/conftest.py is working correctly
  - No Google dependency import errors in OpenAI tests
  - Proper test isolation achieved between different LLM providers
@joggrbot
Copy link

joggrbot bot commented Oct 14, 2025

📝 Documentation Analysis

All docs are up to date! 🎉


✅ Latest commit analyzed: 37fcfb0 | Powered by Joggr

@claude
Copy link

claude bot commented Oct 15, 2025

Pull Request Review: Enable list[dict] type for message content

Summary

This PR adds support for multimodal content (list of dictionaries) in the message content field across AG2 chat APIs, enabling compatibility with OpenAI Response API and new Gemini client formats. The implementation is comprehensive and well-tested.

Strengths

Code Quality

  • Excellent abstraction: The content_str() utility function in autogen/code_utils.py:49 provides clean normalization
  • Comprehensive coverage: Changes properly handle multimodal content across all critical paths
  • Type safety: Proper type annotations added to ModelClientResponseProtocol
  • Defensive programming: Consistent type checking prevents runtime errors

Test Coverage

  • 1385+ lines of integration tests across 18 test functions
  • Real API testing with actual LLM APIs (OpenAI, Responses API)
  • Multiple scenarios: two-agent chat, group chat, sequential chats, tool execution, safeguards
  • Good test organization with reusable helper functions

Backward Compatibility

  • Non-breaking changes preserve existing string content handling
  • Graceful degradation with fallback to str() for unexpected types

Issues Found

Critical Bug (test_multimodal_integration.py:86)

Line 86 has duplicate image_url in list - should be image_url and input_image to match autogen/code_utils.py:82
Impact: Test wont properly validate input_image type content
Fix: Change the duplicate to input_image

Potential Enhancement (code_utils.py:79)

The content_str() function uses assertions for validation. Assertions can be disabled with Pythons -O flag. Consider using explicit if checks with ValueError instead for production safety.

Code Quality Observations

Good Practices

  1. Consistent type checking pattern before calling content_str()
  2. Error handling in SafeguardEnforcer with try/except
  3. Clear docstrings in test helper functions

Minor Improvements

  1. Redundant pattern: The conditional check before content_str() appears in 8+ locations but is redundant since content_str() already handles those types
  2. Workflow change at .github/workflows/core-llm-test.yml:63 adding fallback is good

Performance

No performance concerns. The content_str() function is efficient with simple list iteration and string joining.

Security

No security vulnerabilities identified. Input validation is appropriate with type checking and explicit allowed types. No injection risks. Safeguard enforcement properly handles multimodal content with masking.

Test Coverage

Excellent - 18 total tests covering two-agent chat, group chat, sequential chat, nested chat, tool execution, and guardrails. Both Chat Completion API and Responses API formats tested.

Suggestion: Add unit tests for content_str() function edge cases (empty list, unknown type, mixed content, non-dict items).

Recommendations

Must Fix:

  1. Fix duplicate image_url bug in test_multimodal_integration.py:86

Should Consider:
2. Replace assertions with explicit ValueError checks in content_str()
3. Simplify redundant type checks before content_str() calls

Nice to Have:
4. Add unit tests for content_str() function
5. Document supported content types in module docstring
6. Consider type alias for content types

Overall Assessment

Verdict: Approve with minor fixes (4.5/5)

Well-engineered PR with thorough implementation, proper edge case handling, and excellent test coverage. Code follows good practices with type annotations and defensive programming. The critical bug is minor and easy to fix. Great work on comprehensive testing and maintaining backward compatibility!

@claude
Copy link

claude bot commented Oct 16, 2025

Code Review for PR #2145: Enable list[dict] type for message['content']

Summary

This PR extends AG2's message handling to support multimodal content in the form of list[dict] for the message['content'] field, aligning with OpenAI's Response API and Gemini client formats. The implementation is comprehensive and well-tested.

Strengths

1. Excellent Test Coverage

  • Added 887 lines of integration tests covering two-agent conversations, group chats, sequential chats with carryover, tool execution, safeguards, and clear history
  • Tests verify both Chat Completion API and Responses API formats

2. Consistent Pattern Application & Backward Compatibility

  • Changes maintain backward compatibility with existing string-based content
  • Graceful handling of edge cases

Issues and Concerns

1. Code Duplication - High Priority

The normalization logic is repeated 10+ times across multiple files in chat.py, conversable_agent.py, groupchat.py, group_tool_executor.py, and safeguards/enforcer.py.

Recommendation: Extract this into a reusable helper function to reduce duplication.

2. Potential Type Inconsistency

In groupchat.py:1776-1786, the code modifies reply['content'] after normalizing, converting list[dict] to strings.

Recommendation: Verify this is intentional and document why mutation is necessary.

3. Inconsistent Error Handling

Most locations don't have try-except blocks around content_str() calls, but SafeguardEnforcer._stringify_content does.

4. Missing Type Hints

The _post_process_carryover_item function lacks type hints.

5. Workflow File Changes

The workflow file changes appear unrelated to the main feature - consider splitting into separate PR.

Security & Performance

  • No security concerns identified
  • Performance impact minimal

Recommendations Summary

High Priority:

  1. Extract normalization logic into helper function
  2. Verify mutation behavior in clear_agents_history

Medium Priority:
3. Add type hints
4. Clarify exception handling strategy
5. Update docstrings

Overall Assessment

Well-implemented feature with excellent test coverage. Main issue is code duplication, easily addressed.

Recommendation: Approve with requested changes.

Great work on comprehensive testing!

@claude
Copy link

claude bot commented Oct 16, 2025

PR Review: Enable list[dict] type for message['content']

Thank you for this comprehensive PR! This is a well-structured enhancement that enables multimodal content support for AG2's chat APIs.

✅ Strengths

  1. Excellent Test Coverage (1,385 lines of new tests)

    • Comprehensive multimodal integration tests covering two-agent chat, group chat, and sequential chat
    • Tests for both Chat Completion API and Responses API formats
    • Good edge case coverage with content_str processing verification
  2. Consistent Implementation Pattern

    • The PR follows a consistent pattern throughout the codebase for content normalization
    • Applied uniformly across all modified files
  3. Backward Compatibility

    • Changes are additive and maintain backward compatibility with existing string-based content

⚠️ Issues & Concerns

1. Code Duplication (High Priority)

The content normalization pattern is repeated ~15+ times across the codebase. Extract this into a helper function to improve maintainability.

Locations: chat.py:136-138, conversable_agent.py:1082-1087, conversable_agent.py:3984-3987, group_tool_executor.py:208, enforcer.py (multiple), groupchat.py (multiple)

2. Potential Bug: Message Mutation in groupchat.py (Medium Priority)

In groupchat.py:1709-1716 and groupchat.py:1779-1789, the code converts list[dict] content to str when removing termination strings or clearing history. This permanently converts multimodal content into a plain string, losing the structured format.

Impact: Breaks multimodal content after termination. May cause issues in resumed conversations.

Recommendation: Either preserve the list structure and only modify text parts, or document that this is intentional behavior.

3. Type Safety Issues (Medium Priority)

In conversable_agent.py:1354-1363, the change removed the isinstance check but doesn't validate that content is actually a valid type. Add validation for str, list, or None.

4. Error Handling in content_str (Low Priority)

The content_str function uses assertions for validation. Assertions can be disabled with python -O. Replace with proper exceptions.

5. Missing Documentation (Medium Priority)

  • No updates to documentation about the new multimodal content support
  • The PR description mentions updating docs but the checkbox is unchecked
  • Add docstring examples showing multimodal content usage

🔒 Security Assessment

✅ No major security concerns identified. The sanitization logic in safeguards/enforcer.py properly handles multimodal content.

📊 Test Coverage Assessment

Excellent coverage overall, but consider adding:

  1. Error case tests (invalid multimodal content structure)
  2. Edge case tests (empty content list, content list with only images)
  3. Integration tests (termination message handling with multimodal content)
  4. Unit tests for helpers (direct tests for content_str)

🎯 Recommendations Summary

Should Fix (High Priority):

  1. Extract repeated normalization logic into a helper function
  2. Fix or document the message mutation issue in groupchat.py
  3. Add type validation in _should_terminate_chat

Nice to Have:

  1. Replace assertions with proper exceptions in content_str
  2. Add documentation for multimodal content feature
  3. Add more edge case tests

📝 Additional Notes

  1. The workflow file change in .github/workflows/core-llm-test.yml:63 is a good defensive improvement
  2. The test fixture uses gpt-4.1-mini - verify this is intentional
  3. The PR properly handles both OpenAI Chat Completion API format and Responses API format - well-designed

✨ Overall Assessment

This is a well-executed PR that adds important multimodal support to AG2. The implementation is thorough, the test coverage is excellent, and the changes maintain backward compatibility.

Recommendation: Approve with requested changes

Great work @randombet! 🎉

@randombet randombet added this pull request to the merge queue Oct 16, 2025
Merged via the queue into main with commit 350caa2 Oct 16, 2025
24 checks passed
@randombet randombet deleted the content_testing branch October 16, 2025 18:03
@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 48.33333% with 31 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
autogen/agentchat/groupchat.py 34.61% 13 Missing and 4 partials ⚠️
autogen/agentchat/group/safeguards/enforcer.py 20.00% 12 Missing ⚠️
autogen/agentchat/chat.py 60.00% 1 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
autogen/agentchat/conversable_agent.py 62.50% <100.00%> (-1.60%) ⬇️
autogen/agentchat/group/group_tool_executor.py 88.78% <100.00%> (+1.16%) ⬆️
autogen/llm_config/client.py 100.00% <100.00%> (ø)
autogen/oai/client.py 55.29% <100.00%> (-3.93%) ⬇️
autogen/agentchat/chat.py 73.33% <60.00%> (-0.58%) ⬇️
autogen/agentchat/group/safeguards/enforcer.py 23.56% <20.00%> (+0.19%) ⬆️
autogen/agentchat/groupchat.py 65.37% <34.61%> (-5.24%) ⬇️

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude claude bot mentioned this pull request Oct 22, 2025
3 tasks
@marklysze marklysze changed the title [Feature] Enable list[dict] type for message['content'] for two-agent chat and group chat APIs feat: Enable list[dict] type for message['content'] for two-agent chat and group chat APIs Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants