Skip to content

feat: enhance MCP tool response handling to support image responses #5185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

shivangag
Copy link

@shivangag shivangag commented Jun 27, 2025

Related GitHub Issue

Fixes: #5233

Description

This PR enhances MCP (Model Context Protocol) tool response handling to support images alongside text content. The implementation allows MCP servers to return images in their tool responses, which are then properly displayed in the Roo Code UI.

Key implementation details:

Design choices:

  • Images are handled as part of the tool result structure, maintaining backward compatibility with text-only responses
  • The implementation supports both text-only, image-only, and combined text+image responses from MCP tools

Test Procedure

Unit Tests Added:

  • Added comprehensive tests in useMcpToolTool.spec.ts to verify:
    • Correct extraction of both text and images from tool results
    • Proper handling of image-only responses
    • Backward compatibility with text-only responses
  • Updated tests in combineCommandSequences.spec.ts to ensure images are preserved during command sequence processing

Manual Testing:

  1. Connect to an MCP server that supports image responses (e.g., browser MCP server with screenshot capability)
  2. Execute a tool that returns images (e.g., take a screenshot)
  3. Verify that both text and images are displayed correctly in the chat interface
  4. Test with various MCP tools to ensure backward compatibility with text-only responses

Reviewers can test by:

  1. Running the test suite: npm test -- useMcpToolTool.spec.ts
  2. Setting up an MCP server with image capabilities and testing the integration manually

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Before: MCP tool responses only displayed text content, images were not supported.
Screenshot 2025-06-27 at 2 32 26 PM

After: MCP tool responses now display both text and images seamlessly in the chat interface.
codicon-file-media indicates if images are present in mcp response and the number of images.
Screenshot 2025-06-27 at 2 18 30 PM
Images are presented using the Thumbnail component inside mcp-server-response block.
Screenshot 2025-06-27 at 2 19 29 PM

Documentation Updates

  • No documentation updates are required.
  • Yes, documentation updates are required. The MCP integration documentation should be updated to mention image support capabilities for tool responses.

Additional Notes

  • This enhancement maintains full backward compatibility with existing MCP servers that only return text
  • The implementation follows the existing patterns for handling MCP tool responses
  • Images are processed and displayed using the same infrastructure as other image content in Roo Code

Improvements

Old chat loaded from memory show tool call argument inside the response.
Screenshot 2025-06-27 at 2 19 44 PM


Important

Enhances MCP tool response handling to support image responses, updating processing logic, UI components, and adding new settings for image handling.

  • Behavior:
    • processToolContent() in useMcpToolTool.ts now extracts and returns both text and images from MCP tool results.
    • executeToolAndProcessResult() in useMcpToolTool.ts handles image data and passes it through the response pipeline.
    • combineCommandSequences() in combineCommandSequences.ts preserves images from MCP server responses.
    • UI components McpExecution.tsx and ChatRow.tsx updated to render images alongside text responses.
  • Settings:
    • Adds mcpMaxImagesPerResponse and mcpMaxImageSizeMB to global-settings.ts for image handling configuration.
  • Tests:
    • Adds tests in useMcpToolTool.spec.ts for image extraction and handling.
    • Updates combineCommandSequences.spec.ts to test image preservation in command sequences.
  • Misc:
    • Updates webviewMessageHandler.ts to handle new image settings.
    • Adds image handling logic to Thumbnails.tsx for UI display.

This description was created by Ellipsis for a416090. You can customize this summary. It will automatically update as commits are pushed.

**Changes:**
- Updated `processToolContent` to return both text and images from tool results.
- Modified `executeToolAndProcessResult` to handle and pass images to the response.
- Adjusted `combineCommandSequences` to preserve images from MCP server responses.
- Updated UI components to display images alongside text responses.

**Testing:**
- Added tests to verify correct handling of tool results with text and images.
- Ensured that image-only responses are processed correctly.

**Files Modified:**
- `src/core/prompts/responses.ts`
- `src/core/tools/useMcpToolTool.ts`
- `src/shared/combineCommandSequences.ts`
- `webview-ui/src/components/chat/McpExecution.tsx`
- `webview-ui/src/components/chat/ChatRow.tsx`
- Test files for MCP tool functionality.
@shivangag shivangag requested review from mrubens, cte and jr as code owners June 27, 2025 09:10
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request labels Jun 27, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jun 27, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jun 27, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Jun 27, 2025
Copy link
Collaborator

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shivangag,

I noticed this PR doesn't have a linked GitHub issue. As per the contributing guidelines, all PRs need to be linked to an approved issue. Could you create an issue describing this enhancement and link it to this PR? This helps us keep things traceable and ensures features are properly discussed before implementation.

I'll mark this PR as "In progress" for now until the dev team has a chance to review the issue.

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Jun 27, 2025
@daniel-lxs daniel-lxs marked this pull request as draft June 27, 2025 20:31
**Changes:**
- Improved validation for base64 image data in `processToolContent` to handle invalid and non-string data gracefully.
- Added error handling to log warnings for corrupted images without interrupting processing.
- Updated tests to verify correct behavior when encountering invalid base64 data and non-string inputs.

**Files Modified:**
- `src/core/tools/useMcpToolTool.ts`
- `src/core/tools/__tests__/useMcpToolTool.spec.ts`
- `webview-ui/src/components/common/Thumbnails.tsx`
**Changes:**
- Introduced `mcpMaxImagesPerResponse` and `mcpMaxImageSizeMB` settings to control the maximum number of images and their size in MCP tool responses.
- Updated `processToolContent` to enforce these limits, logging warnings when they are exceeded.
- Enhanced UI components to allow users to configure these settings.
- Added tests to verify correct behavior under various image limits and sizes.

**Files Modified:**
- `packages/types/src/global-settings.ts`
- `src/core/tools/useMcpToolTool.ts`
- `src/core/webview/ClineProvider.ts`
- `src/core/webview/webviewMessageHandler.ts`
- `src/shared/ExtensionMessage.ts`
- `src/shared/WebviewMessage.ts`
- `webview-ui/src/components/mcp/McpView.tsx`
- `webview-ui/src/context/ExtensionStateContext.tsx`
- `webview-ui/src/context/__tests__/ExtensionStateContext.spec.tsx`
- Test files for MCP tool functionality.
**Changes:**
- Introduced a constant `SUPPORTED_IMAGE_TYPES` to define valid image MIME types, improving code readability and maintainability.
- Updated `processToolContent` to utilize the new constant for image type validation.

**Files Modified:**
- `src/core/tools/useMcpToolTool.ts`
…ypes

**Changes:**
- Added tests to ensure that the MCP tool correctly ignores images with unsupported MIME types and malformed content.
- Implemented console warnings for unsupported MIME types and missing properties in image content.

**Files Modified:**
- `src/core/tools/__tests__/useMcpToolTool.spec.ts`
@shivangag
Copy link
Author

shivangag commented Jun 29, 2025

Thank you @daniel-lxs for the thorough review and excellent suggestions! I've implemented all the recommended enhancements to make MCP image handling more robust and secure.

Security & Performance Controls

  • Image size limits: Added configurable mcpMaxImageSizeMB setting with 10MB default
  • Count limits: Added configurable mcpMaxImagesPerResponse setting with 20 image default
  • User controls: Both settings are configurable through the MCP settings panel

Robustness Improvements

  • Error handling: Wrapped image processing in try-catch blocks to prevent crashes from corrupted data
  • Validation: Added base64 regex validation and type checking for data integrity
  • Graceful degradation: Invalid images are skipped with detailed warning logs instead of failing

Code Quality Enhancements

  • Maintainability: Extracted SUPPORTED_IMAGE_TYPES as a module-level constant
  • Test coverage: Added comprehensive test suite covering all suggested edge cases including corrupted data, large images, unsupported MIME types, and malformed content

The implementation now safely handles potentially malicious or buggy MCP servers while maintaining full backward compatibility with existing functionality. I believe this addresses all the security, performance, and reliability concerns raised in the review.

Also created GitHub issue #5233 to properly document this feature enhancement.

@shivangag shivangag requested a review from daniel-lxs July 4, 2025 23:21
@shivangag shivangag marked this pull request as ready for review July 7, 2025 11:37
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jul 7, 2025
@daniel-lxs daniel-lxs moved this from PR [Draft / In Progress] to PR [Needs Prelim Review] in Roo Code Roadmap Jul 7, 2025
Copy link
Collaborator

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shivangag, Thank you for taking a look at my previous suggestions. I took another look and left some new suggestions from my review. Let me know if you have any questions!

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Changes Requested] in Roo Code Roadmap Jul 8, 2025
- Add input validation for max images per response (1-100 range)
- Add input validation for max image size in MB (1-100 range)
- Display error messages for invalid inputs
- Internationalize all image settings labels and descriptions
- Update translations across all supported locales
@shivangag shivangag requested a review from daniel-lxs July 8, 2025 23:27
@daniel-lxs daniel-lxs moved this from PR [Changes Requested] to PR [Needs Prelim Review] in Roo Code Roadmap Jul 11, 2025
Copy link
Collaborator

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shivangag, sorry for the delay, the implementation looks solid, I left a couple of suggestions to handle potential issues.

Let me know what you think!

}

// Check image size
const imageSizeMB = calculateImageSizeMB(item.data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image size calculation happens after the entire base64 string is already in memory. For very large images, this could cause memory spikes before the size check rejects them.

Consider adding an early size check based on string length:

// Quick size check before full validation
const approximateSizeMB = (item.data.length * 0.75) / (1024 * 1024);
if (approximateSizeMB > maxImageSizeMB * 1.5) { // Allow some margin for encoding overhead
    console.warn(`MCP image likely exceeds size limit based on string length`);
    return;
}

This would prevent processing extremely large strings that are obviously over the limit.

cursor: "pointer",
}}
onClick={() => handleImageClick(image)}
title="Failed to load image">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcoded English string needs to be translated. Please add a translation key for "Failed to load image".

Suggested fix:

title={t("common:thumbnails.failedToLoad")}

Don't forget to add the translation to all locale files.

) : (
<img
src={image}
alt={`Thumbnail ${index + 1}`}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alt text contains a hardcoded English string with string interpolation. This needs to be translated.

Suggested fix:

alt={t("common:thumbnails.altText", { index: index + 1 })}

The translation key should support interpolation, e.g., "Thumbnail {{index}}"


// Get MCP settings from the extension's global state
const state = await cline.providerRef.deref()?.getState()
const maxImagesPerResponse = state?.mcpMaxImagesPerResponse ?? 20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation uses the settings values without validating whether they're within reasonable bounds. For example, if someone manually edits their settings file and sets a negative number or an extremely large value, it could lead to unexpected behavior.

It might be worth adding validation like this:

const maxImagesPerResponse = Math.max(1, Math.min(100, state?.mcpMaxImagesPerResponse ?? 20));
const maxImageSizeMB = Math.max(0.1, Math.min(50, state?.mcpMaxImageSizeMB ?? 10));

This helps keep the values within safe operational limits, even in cases where the settings file is corrupted or manually modified.

/>
</Button>
)}
{(responseText && responseText.length > 0) || (images && images.length > 0) ? (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image count indicator is shown even when the response is collapsed. Is this intentional? It might be confusing to show a count for content that isn't visible.

Maybe you could:

  1. Only showing the count when expanded
  2. Adding a tooltip to explain what the count represents
  3. Using different styling to indicate collapsed content with images

const maxImagesPerResponse = state?.mcpMaxImagesPerResponse ?? 20
const maxImageSizeMB = state?.mcpMaxImageSizeMB ?? 10

toolResult.content.forEach((item: any) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Images are validated sequentially in the forEach loop. For responses with many images, this could be slow.

Consider batching validation or using Promise.all for parallel processing:

const validatedImages = await Promise.all(
  toolResult.content
    .filter((item: any) => item.type === "image")
    .map(async (item: any) => validateAndProcessImage(item))
);
images.push(...validatedImages.filter(Boolean));

This would improve performance for MCP tools that return multiple images.

}

// Basic validation for base64 format
const base64Regex = /^[A-Za-z0-9+/]*={0,2}$/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base64 validation regex could be extracted as a constant for better maintainability and potential reuse:

const BASE64_REGEX = /^[A-Za-z0-9+/]*={0,2}$/;

This would make it easier to maintain and test the validation pattern.

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Changes Requested] in Roo Code Roadmap Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request PR - Changes Requested size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
Status: PR [Changes Requested]
Development

Successfully merging this pull request may close these issues.

Feature: Enhance MCP Image Handling with Image Support, Robustness, and Security Controls
3 participants