feat: Enable list[dict] type for message['content'] for two-agent chat and group chat APIs #2145

randombet · 2025-10-14T18:50:36Z

Why are these changes needed?

OAI Response API and new Gemini client support list[dict] in their output content field. See below for details.

OAI Response API output is a list of ResponseOutputItem. E.g. output of an image generation could be a list of two items, one text output and one encoded image string.
To support use cases like this, I would like to make AG2 message to messages.

Similarly, the content of OAI response API input (https://github.com/openai/openai-python/blob/main/src/openai/types/responses/response_input_item.py#L42 and https://github.com/openai/openai-python/blob/main/src/openai/types/responses/easy_input_message.py#L13) is a ResponseInputMessageContentList

Gemini Chat also support sending a list of messages https://github.com/googleapis/python-genai/blob/main/google/genai/chats.py#L227

Enable list[dict] type for message['content']
Add multimodal testing with list[dict] type for message['content'] for two-agent chat and groupchat with responses API.

Related issue number

Replace PR #2134

Checks

LLM test with
https://github.com/ag2ai/ag2/actions/runs/18505526861/job/52733291499

Skip LLM test with
https://github.com/ag2ai/ag2/actions/runs/18439791453

I've included any doc changes needed for https://docs.ag2.ai/. See https://docs.ag2.ai/latest/docs/contributor-guide/documentation/ to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

…group chat APIs

…tool_executor.py:149-157) - Issue: tool_message variable was not initialized before the loop, causing UnboundLocalError when no tool calls were present - Fix: Added tool_message = None initialization and improved condition checking with message.get('tool_calls') instead of 'tool_calls' in message

- Marker-based filtering system in test/conftest.py is working correctly - No Google dependency import errors in OpenAI tests - Proper test isolation achieved between different LLM providers

joggrbot · 2025-10-14T18:50:51Z

📝 Documentation Analysis

All docs are up to date! 🎉

✅ Latest commit analyzed: 37fcfb0 | Powered by Joggr

claude · 2025-10-15T04:44:25Z

Pull Request Review: Enable list[dict] type for message content

Summary

This PR adds support for multimodal content (list of dictionaries) in the message content field across AG2 chat APIs, enabling compatibility with OpenAI Response API and new Gemini client formats. The implementation is comprehensive and well-tested.

Strengths

Code Quality

Excellent abstraction: The content_str() utility function in autogen/code_utils.py:49 provides clean normalization
Comprehensive coverage: Changes properly handle multimodal content across all critical paths
Type safety: Proper type annotations added to ModelClientResponseProtocol
Defensive programming: Consistent type checking prevents runtime errors

Test Coverage

1385+ lines of integration tests across 18 test functions
Real API testing with actual LLM APIs (OpenAI, Responses API)
Multiple scenarios: two-agent chat, group chat, sequential chats, tool execution, safeguards
Good test organization with reusable helper functions

Backward Compatibility

Non-breaking changes preserve existing string content handling
Graceful degradation with fallback to str() for unexpected types

Issues Found

Critical Bug (test_multimodal_integration.py:86)

Line 86 has duplicate image_url in list - should be image_url and input_image to match autogen/code_utils.py:82
Impact: Test wont properly validate input_image type content
Fix: Change the duplicate to input_image

Potential Enhancement (code_utils.py:79)

The content_str() function uses assertions for validation. Assertions can be disabled with Pythons -O flag. Consider using explicit if checks with ValueError instead for production safety.

Code Quality Observations

Good Practices

Consistent type checking pattern before calling content_str()
Error handling in SafeguardEnforcer with try/except
Clear docstrings in test helper functions

Minor Improvements

Redundant pattern: The conditional check before content_str() appears in 8+ locations but is redundant since content_str() already handles those types
Workflow change at .github/workflows/core-llm-test.yml:63 adding fallback is good

Performance

No performance concerns. The content_str() function is efficient with simple list iteration and string joining.

Security

No security vulnerabilities identified. Input validation is appropriate with type checking and explicit allowed types. No injection risks. Safeguard enforcement properly handles multimodal content with masking.

Test Coverage

Excellent - 18 total tests covering two-agent chat, group chat, sequential chat, nested chat, tool execution, and guardrails. Both Chat Completion API and Responses API formats tested.

Suggestion: Add unit tests for content_str() function edge cases (empty list, unknown type, mixed content, non-dict items).

Recommendations

Must Fix:

Fix duplicate image_url bug in test_multimodal_integration.py:86

Should Consider:
2. Replace assertions with explicit ValueError checks in content_str()
3. Simplify redundant type checks before content_str() calls

Nice to Have:
4. Add unit tests for content_str() function
5. Document supported content types in module docstring
6. Consider type alias for content types

Overall Assessment

Verdict: Approve with minor fixes (4.5/5)

Well-engineered PR with thorough implementation, proper edge case handling, and excellent test coverage. Code follows good practices with type annotations and defensive programming. The critical bug is minor and easy to fix. Great work on comprehensive testing and maintaining backward compatibility!

autogen/agentchat/group/group_tool_executor.py

test/agentchat/test_multimodal_integration.py

claude · 2025-10-16T04:37:18Z

Code Review for PR #2145: Enable list[dict] type for message['content']

Summary

This PR extends AG2's message handling to support multimodal content in the form of list[dict] for the message['content'] field, aligning with OpenAI's Response API and Gemini client formats. The implementation is comprehensive and well-tested.

Strengths

1. Excellent Test Coverage

Added 887 lines of integration tests covering two-agent conversations, group chats, sequential chats with carryover, tool execution, safeguards, and clear history
Tests verify both Chat Completion API and Responses API formats

2. Consistent Pattern Application & Backward Compatibility

Changes maintain backward compatibility with existing string-based content
Graceful handling of edge cases

Issues and Concerns

1. Code Duplication - High Priority

The normalization logic is repeated 10+ times across multiple files in chat.py, conversable_agent.py, groupchat.py, group_tool_executor.py, and safeguards/enforcer.py.

Recommendation: Extract this into a reusable helper function to reduce duplication.

2. Potential Type Inconsistency

In groupchat.py:1776-1786, the code modifies reply['content'] after normalizing, converting list[dict] to strings.

Recommendation: Verify this is intentional and document why mutation is necessary.

3. Inconsistent Error Handling

Most locations don't have try-except blocks around content_str() calls, but SafeguardEnforcer._stringify_content does.

4. Missing Type Hints

The _post_process_carryover_item function lacks type hints.

5. Workflow File Changes

The workflow file changes appear unrelated to the main feature - consider splitting into separate PR.

Security & Performance

No security concerns identified
Performance impact minimal

Recommendations Summary

High Priority:

Extract normalization logic into helper function
Verify mutation behavior in clear_agents_history

Medium Priority:
3. Add type hints
4. Clarify exception handling strategy
5. Update docstrings

Overall Assessment

Well-implemented feature with excellent test coverage. Main issue is code duplication, easily addressed.

Recommendation: Approve with requested changes.

Great work on comprehensive testing!

claude · 2025-10-16T04:38:06Z

PR Review: Enable list[dict] type for message['content']

Thank you for this comprehensive PR! This is a well-structured enhancement that enables multimodal content support for AG2's chat APIs.

✅ Strengths

Excellent Test Coverage (1,385 lines of new tests)
- Comprehensive multimodal integration tests covering two-agent chat, group chat, and sequential chat
- Tests for both Chat Completion API and Responses API formats
- Good edge case coverage with content_str processing verification
Consistent Implementation Pattern
- The PR follows a consistent pattern throughout the codebase for content normalization
- Applied uniformly across all modified files
Backward Compatibility
- Changes are additive and maintain backward compatibility with existing string-based content

⚠️ Issues & Concerns

1. Code Duplication (High Priority)

The content normalization pattern is repeated ~15+ times across the codebase. Extract this into a helper function to improve maintainability.

Locations: chat.py:136-138, conversable_agent.py:1082-1087, conversable_agent.py:3984-3987, group_tool_executor.py:208, enforcer.py (multiple), groupchat.py (multiple)

2. Potential Bug: Message Mutation in groupchat.py (Medium Priority)

In groupchat.py:1709-1716 and groupchat.py:1779-1789, the code converts list[dict] content to str when removing termination strings or clearing history. This permanently converts multimodal content into a plain string, losing the structured format.

Impact: Breaks multimodal content after termination. May cause issues in resumed conversations.

Recommendation: Either preserve the list structure and only modify text parts, or document that this is intentional behavior.

3. Type Safety Issues (Medium Priority)

In conversable_agent.py:1354-1363, the change removed the isinstance check but doesn't validate that content is actually a valid type. Add validation for str, list, or None.

4. Error Handling in content_str (Low Priority)

The content_str function uses assertions for validation. Assertions can be disabled with python -O. Replace with proper exceptions.

5. Missing Documentation (Medium Priority)

No updates to documentation about the new multimodal content support
The PR description mentions updating docs but the checkbox is unchecked
Add docstring examples showing multimodal content usage

🔒 Security Assessment

✅ No major security concerns identified. The sanitization logic in safeguards/enforcer.py properly handles multimodal content.

📊 Test Coverage Assessment

Excellent coverage overall, but consider adding:

Error case tests (invalid multimodal content structure)
Edge case tests (empty content list, content list with only images)
Integration tests (termination message handling with multimodal content)
Unit tests for helpers (direct tests for content_str)

🎯 Recommendations Summary

Should Fix (High Priority):

Extract repeated normalization logic into a helper function
Fix or document the message mutation issue in groupchat.py
Add type validation in _should_terminate_chat

Nice to Have:

Replace assertions with proper exceptions in content_str
Add documentation for multimodal content feature
Add more edge case tests

📝 Additional Notes

The workflow file change in .github/workflows/core-llm-test.yml:63 is a good defensive improvement
The test fixture uses gpt-4.1-mini - verify this is intentional
The PR properly handles both OpenAI Chat Completion API format and Responses API format - well-designed

✨ Overall Assessment

This is a well-executed PR that adds important multimodal support to AG2. The implementation is thorough, the test coverage is excellent, and the changes maintain backward compatibility.

Recommendation: Approve with requested changes

Great work @randombet! 🎉

Not valid

codecov · 2025-10-16T18:04:42Z

Codecov Report

❌ Patch coverage is 48.33333% with 31 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
autogen/agentchat/groupchat.py	34.61%	13 Missing and 4 partials ⚠️
autogen/agentchat/group/safeguards/enforcer.py	20.00%	12 Missing ⚠️
autogen/agentchat/chat.py	60.00%	1 Missing and 1 partial ⚠️

Files with missing lines	Coverage Δ
autogen/agentchat/conversable_agent.py	`62.50% <100.00%> (-1.60%)`	⬇️
autogen/agentchat/group/group_tool_executor.py	`88.78% <100.00%> (+1.16%)`	⬆️
autogen/llm_config/client.py	`100.00% <100.00%> (ø)`
autogen/oai/client.py	`55.29% <100.00%> (-3.93%)`	⬇️
autogen/agentchat/chat.py	`73.33% <60.00%> (-0.58%)`	⬇️
autogen/agentchat/group/safeguards/enforcer.py	`23.56% <20.00%> (+0.19%)`	⬆️
autogen/agentchat/groupchat.py	`65.37% <34.61%> (-5.24%)`	⬇️

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

randombet added 12 commits October 2, 2025 23:42

Enable list[dict] type for message['content'] for two-agent chat and …

e8e335a

…group chat APIs

Update test marker

9605fd7

Add back test_custom_client.py

5808dcd

skip test_multimodal_integration.py from no llm test

82dc1f5

Merge branch 'main' into gemini_client_update

17d7075

Update skip llm test

bf51bcb

update

f36714f

Update e2e test

426bf63

enable manual picked branch for workflow

db876c1

Verified Core Import Isolation

5f0d640

- Marker-based filtering system in test/conftest.py is working correctly - No Google dependency import errors in OpenAI tests - Proper test isolation achieved between different LLM providers

Add responses client to testing config

91bae0b

randombet requested review from marklysze and priyansh4320 October 14, 2025 18:51

Merge branch 'main' into content_testing

2b4e77b

randombet mentioned this pull request Oct 14, 2025

[Feature] Enable list[dict] type for message['content'] for two-agent chat and group chat APIs #2134

Closed

3 tasks

randombet requested a review from Lancetnik October 15, 2025 03:22

Merge branch 'main' into content_testing

502c0d7

priyansh4320 previously requested changes Oct 15, 2025

View reviewed changes

autogen/agentchat/group/group_tool_executor.py Show resolved Hide resolved

test/agentchat/test_multimodal_integration.py Outdated Show resolved Hide resolved

Lancetnik reviewed Oct 15, 2025

View reviewed changes

test/agentchat/test_multimodal_integration.py Show resolved Hide resolved

Lancetnik reviewed Oct 15, 2025

View reviewed changes

test/agentchat/test_multimodal_integration.py Show resolved Hide resolved

Lancetnik approved these changes Oct 15, 2025

View reviewed changes

randombet and others added 3 commits October 15, 2025 21:33

Fix test

1333f94

Disable Claude code review on Draft PRs (#2147)

dc1ed5d

Merge branch 'main' into content_testing

37fcfb0

priyansh4320 approved these changes Oct 16, 2025

View reviewed changes

randombet added this pull request to the merge queue Oct 16, 2025

Merged via the queue into main with commit 350caa2 Oct 16, 2025
24 checks passed

randombet deleted the content_testing branch October 16, 2025 18:03

randombet mentioned this pull request Oct 16, 2025

chore: Remove custom client multimodal tests #2151

Merged

3 tasks

claude bot mentioned this pull request Oct 22, 2025

Bump version to 0.10 #2162

Merged

3 tasks

marklysze changed the title ~~[Feature] Enable list[dict] type for message['content'] for two-agent chat and group chat APIs~~ feat: Enable list[dict] type for message['content'] for two-agent chat and group chat APIs Oct 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Enable list[dict] type for message['content'] for two-agent chat and group chat APIs #2145

feat: Enable list[dict] type for message['content'] for two-agent chat and group chat APIs #2145

Uh oh!

randombet commented Oct 14, 2025 •

edited

Loading

Uh oh!

joggrbot bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Oct 16, 2025

Uh oh!

claude bot commented Oct 16, 2025

Uh oh!

Uh oh!

codecov bot commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: Enable list[dict] type for message['content'] for two-agent chat and group chat APIs #2145

feat: Enable list[dict] type for message['content'] for two-agent chat and group chat APIs #2145

Uh oh!

Conversation

randombet commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

joggrbot bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Documentation Analysis

Uh oh!

claude bot commented Oct 15, 2025

Pull Request Review: Enable list[dict] type for message content

Summary

Strengths

Code Quality

Test Coverage

Backward Compatibility

Issues Found

Critical Bug (test_multimodal_integration.py:86)

Potential Enhancement (code_utils.py:79)

Code Quality Observations

Good Practices

Minor Improvements

Performance

Security

Test Coverage

Recommendations

Overall Assessment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Oct 16, 2025

Code Review for PR #2145: Enable list[dict] type for message['content']

Summary

Strengths

1. Excellent Test Coverage

2. Consistent Pattern Application & Backward Compatibility

Issues and Concerns

1. Code Duplication - High Priority

2. Potential Type Inconsistency

3. Inconsistent Error Handling

4. Missing Type Hints

5. Workflow File Changes

Security & Performance

Recommendations Summary

Overall Assessment

Uh oh!

claude bot commented Oct 16, 2025

PR Review: Enable list[dict] type for message['content']

✅ Strengths

⚠️ Issues & Concerns

1. Code Duplication (High Priority)

2. Potential Bug: Message Mutation in groupchat.py (Medium Priority)

3. Type Safety Issues (Medium Priority)

4. Error Handling in content_str (Low Priority)

5. Missing Documentation (Medium Priority)

🔒 Security Assessment

📊 Test Coverage Assessment

🎯 Recommendations Summary

📝 Additional Notes

✨ Overall Assessment

Uh oh!

Uh oh!

codecov bot commented Oct 16, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

randombet commented Oct 14, 2025 •

edited

Loading

joggrbot bot commented Oct 14, 2025 •

edited

Loading