-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: enhance MCP tool response handling to support image responses #5185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
**Changes:** - Updated `processToolContent` to return both text and images from tool results. - Modified `executeToolAndProcessResult` to handle and pass images to the response. - Adjusted `combineCommandSequences` to preserve images from MCP server responses. - Updated UI components to display images alongside text responses. **Testing:** - Added tests to verify correct handling of tool results with text and images. - Ensured that image-only responses are processed correctly. **Files Modified:** - `src/core/prompts/responses.ts` - `src/core/tools/useMcpToolTool.ts` - `src/shared/combineCommandSequences.ts` - `webview-ui/src/components/chat/McpExecution.tsx` - `webview-ui/src/components/chat/ChatRow.tsx` - Test files for MCP tool functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shivangag,
I noticed this PR doesn't have a linked GitHub issue. As per the contributing guidelines, all PRs need to be linked to an approved issue. Could you create an issue describing this enhancement and link it to this PR? This helps us keep things traceable and ensures features are properly discussed before implementation.
I'll mark this PR as "In progress" for now until the dev team has a chance to review the issue.
**Changes:** - Improved validation for base64 image data in `processToolContent` to handle invalid and non-string data gracefully. - Added error handling to log warnings for corrupted images without interrupting processing. - Updated tests to verify correct behavior when encountering invalid base64 data and non-string inputs. **Files Modified:** - `src/core/tools/useMcpToolTool.ts` - `src/core/tools/__tests__/useMcpToolTool.spec.ts` - `webview-ui/src/components/common/Thumbnails.tsx`
**Changes:** - Introduced `mcpMaxImagesPerResponse` and `mcpMaxImageSizeMB` settings to control the maximum number of images and their size in MCP tool responses. - Updated `processToolContent` to enforce these limits, logging warnings when they are exceeded. - Enhanced UI components to allow users to configure these settings. - Added tests to verify correct behavior under various image limits and sizes. **Files Modified:** - `packages/types/src/global-settings.ts` - `src/core/tools/useMcpToolTool.ts` - `src/core/webview/ClineProvider.ts` - `src/core/webview/webviewMessageHandler.ts` - `src/shared/ExtensionMessage.ts` - `src/shared/WebviewMessage.ts` - `webview-ui/src/components/mcp/McpView.tsx` - `webview-ui/src/context/ExtensionStateContext.tsx` - `webview-ui/src/context/__tests__/ExtensionStateContext.spec.tsx` - Test files for MCP tool functionality.
**Changes:** - Introduced a constant `SUPPORTED_IMAGE_TYPES` to define valid image MIME types, improving code readability and maintainability. - Updated `processToolContent` to utilize the new constant for image type validation. **Files Modified:** - `src/core/tools/useMcpToolTool.ts`
…ypes **Changes:** - Added tests to ensure that the MCP tool correctly ignores images with unsupported MIME types and malformed content. - Implemented console warnings for unsupported MIME types and missing properties in image content. **Files Modified:** - `src/core/tools/__tests__/useMcpToolTool.spec.ts`
Thank you @daniel-lxs for the thorough review and excellent suggestions! I've implemented all the recommended enhancements to make MCP image handling more robust and secure. Security & Performance Controls
Robustness Improvements
Code Quality Enhancements
The implementation now safely handles potentially malicious or buggy MCP servers while maintaining full backward compatibility with existing functionality. I believe this addresses all the security, performance, and reliability concerns raised in the review. Also created GitHub issue #5233 to properly document this feature enhancement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shivangag, Thank you for taking a look at my previous suggestions. I took another look and left some new suggestions from my review. Let me know if you have any questions!
- Add input validation for max images per response (1-100 range) - Add input validation for max image size in MB (1-100 range) - Display error messages for invalid inputs - Internationalize all image settings labels and descriptions - Update translations across all supported locales
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shivangag, sorry for the delay, the implementation looks solid, I left a couple of suggestions to handle potential issues.
Let me know what you think!
} | ||
|
||
// Check image size | ||
const imageSizeMB = calculateImageSizeMB(item.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image size calculation happens after the entire base64 string is already in memory. For very large images, this could cause memory spikes before the size check rejects them.
Consider adding an early size check based on string length:
// Quick size check before full validation
const approximateSizeMB = (item.data.length * 0.75) / (1024 * 1024);
if (approximateSizeMB > maxImageSizeMB * 1.5) { // Allow some margin for encoding overhead
console.warn(`MCP image likely exceeds size limit based on string length`);
return;
}
This would prevent processing extremely large strings that are obviously over the limit.
cursor: "pointer", | ||
}} | ||
onClick={() => handleImageClick(image)} | ||
title="Failed to load image"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hardcoded English string needs to be translated. Please add a translation key for "Failed to load image".
Suggested fix:
title={t("common:thumbnails.failedToLoad")}
Don't forget to add the translation to all locale files.
) : ( | ||
<img | ||
src={image} | ||
alt={`Thumbnail ${index + 1}`} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alt text contains a hardcoded English string with string interpolation. This needs to be translated.
Suggested fix:
alt={t("common:thumbnails.altText", { index: index + 1 })}
The translation key should support interpolation, e.g., "Thumbnail {{index}}"
|
||
// Get MCP settings from the extension's global state | ||
const state = await cline.providerRef.deref()?.getState() | ||
const maxImagesPerResponse = state?.mcpMaxImagesPerResponse ?? 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation uses the settings values without validating whether they're within reasonable bounds. For example, if someone manually edits their settings file and sets a negative number or an extremely large value, it could lead to unexpected behavior.
It might be worth adding validation like this:
const maxImagesPerResponse = Math.max(1, Math.min(100, state?.mcpMaxImagesPerResponse ?? 20));
const maxImageSizeMB = Math.max(0.1, Math.min(50, state?.mcpMaxImageSizeMB ?? 10));
This helps keep the values within safe operational limits, even in cases where the settings file is corrupted or manually modified.
/> | ||
</Button> | ||
)} | ||
{(responseText && responseText.length > 0) || (images && images.length > 0) ? ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image count indicator is shown even when the response is collapsed. Is this intentional? It might be confusing to show a count for content that isn't visible.
Maybe you could:
- Only showing the count when expanded
- Adding a tooltip to explain what the count represents
- Using different styling to indicate collapsed content with images
const maxImagesPerResponse = state?.mcpMaxImagesPerResponse ?? 20 | ||
const maxImageSizeMB = state?.mcpMaxImageSizeMB ?? 10 | ||
|
||
toolResult.content.forEach((item: any) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Images are validated sequentially in the forEach loop. For responses with many images, this could be slow.
Consider batching validation or using Promise.all for parallel processing:
const validatedImages = await Promise.all(
toolResult.content
.filter((item: any) => item.type === "image")
.map(async (item: any) => validateAndProcessImage(item))
);
images.push(...validatedImages.filter(Boolean));
This would improve performance for MCP tools that return multiple images.
} | ||
|
||
// Basic validation for base64 format | ||
const base64Regex = /^[A-Za-z0-9+/]*={0,2}$/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base64 validation regex could be extracted as a constant for better maintainability and potential reuse:
const BASE64_REGEX = /^[A-Za-z0-9+/]*={0,2}$/;
This would make it easier to maintain and test the validation pattern.
Related GitHub Issue
Fixes: #5233
Description
This PR enhances MCP (Model Context Protocol) tool response handling to support images alongside text content. The implementation allows MCP servers to return images in their tool responses, which are then properly displayed in the Roo Code UI.
Key implementation details:
processToolContent()
to extract and return both text and images from MCP tool resultsexecuteToolAndProcessResult()
to handle image data and pass it through the response pipelinecombineCommandSequences()
to preserve images from MCP server responses during command sequence processingMcpExecution.tsx
andChatRow.tsx
) to render images alongside text responsesDesign choices:
Test Procedure
Unit Tests Added:
useMcpToolTool.spec.ts
to verify:combineCommandSequences.spec.ts
to ensure images are preserved during command sequence processingManual Testing:
Reviewers can test by:
npm test -- useMcpToolTool.spec.ts
Pre-Submission Checklist
Screenshots / Videos
Before: MCP tool responses only displayed text content, images were not supported.

After: MCP tool responses now display both text and images seamlessly in the chat interface.


codicon-file-media
indicates if images are present in mcp response and the number of images.Images are presented using the
Thumbnail
component insidemcp-server-response
block.Documentation Updates
Additional Notes
Improvements
Old chat loaded from memory show tool call argument inside the response.

Important
Enhances MCP tool response handling to support image responses, updating processing logic, UI components, and adding new settings for image handling.
processToolContent()
inuseMcpToolTool.ts
now extracts and returns both text and images from MCP tool results.executeToolAndProcessResult()
inuseMcpToolTool.ts
handles image data and passes it through the response pipeline.combineCommandSequences()
incombineCommandSequences.ts
preserves images from MCP server responses.McpExecution.tsx
andChatRow.tsx
updated to render images alongside text responses.mcpMaxImagesPerResponse
andmcpMaxImageSizeMB
toglobal-settings.ts
for image handling configuration.useMcpToolTool.spec.ts
for image extraction and handling.combineCommandSequences.spec.ts
to test image preservation in command sequences.webviewMessageHandler.ts
to handle new image settings.Thumbnails.tsx
for UI display.This description was created by
for a416090. You can customize this summary. It will automatically update as commits are pushed.