Skip to content

Add media file reading in filesystem server #2382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

cliffhall
Copy link
Member

@cliffhall cliffhall commented Jul 18, 2025

SEE ADDITIONAL CONTEXT SECTION AT BOTTOM FOR INFO ON SDK ERROR ON LARGE FILES. IN DRAFT WHILE INVESTIGATING.

Summary of Changes

This pull request enhances the file system interaction capabilities by clarifying the purpose of the existing text file reading tool through a rename, and by introducing a dedicated tool for handling binary media files. This allows the system to process and return image and audio data in a base64 format, significantly expanding the types of files that can be directly accessed and utilized.

Highlights

  • Tool Renaming: The existing read_file tool has been renamed to read_text_file to more accurately reflect its function of reading text-based file content.
  • New Media Handling Tool: A new tool, read_media_file, has been introduced. This tool is designed to read image and audio files and return their content as base64 encoded data, along with the appropriate MIME type.
  • Dependency Update: The @modelcontextprotocol/sdk dependency has been updated to version 1.16.0, which also brings in a new transitive dependency, eventsource-parser.
  • Documentation Updates: The filesystem README has been updated to document the renamed read_text_file tool and provide details for the newly added read_media_file tool.
Changelog
  • package-lock.json
    • Updated @modelcontextprotocol/sdk from 1.12.3 to 1.16.0.
    • Added eventsource-parser as a new transitive dependency.
  • src/filesystem/README.md
    • Renamed documentation for read_file to read_text_file.
    • Updated description for read_text_file to specify text-only reading and mention head/tail parameters.
    • Added new documentation for the read_media_file tool, detailing its purpose and input parameters.
  • src/filesystem/index.ts
    • Imported createReadStream for streaming file content (line 13).
    • Renamed ReadFileArgsSchema to ReadTextFileArgsSchema (line 120).
    • Added ReadMediaFileArgsSchema for the new media tool (line 126).
    • Introduced readFileAsBase64Stream utility function to efficiently read and base64 encode file content (lines 482-495).
    • Updated the ListToolsRequestSchema handler to rename read_file to read_text_file and include the new read_media_file tool definition (lines 502-519).
    • Modified the CallToolRequestSchema handler to process read_text_file calls under its new name and added a new case for read_media_file to handle media file reading, MIME type detection, and base64 encoding (lines 631-694).
  • src/filesystem/package.json
    • Updated @modelcontextprotocol/sdk from 1.12.3 to 1.16.0.

Server Details

  • Server: filesystem
  • Changes to: tools

Motivation and Context

Currently, the server-filesystem MCP server has a read_file tool, which has optional head and tail params and always treats the file as utf-8 text.

Issue #533 contains a user report:

When asking Claude app with the filesystem MCP server enabled, asking it to read and analyze a file returns:

I can see there's an image file called "Untitled.png" in the directory. However, I need to clarify something: while I can see that the file exists, I cannot actually open or view image files. I can only work with text-based files and perform file system operations. If you'd like me to analyze the image, you would need to describe what's in it or share it directly in our conversation. Is there something specific about the image you'd like to discuss?

This PR adds support for reading media files (audio and image) with a new read_media_file tool, and the old read_file tool is renamed to read_text_file for clarity.

How Has This Been Tested?

Using the Inspector UI:

small-ss

Using the Inspector CLI

Screenshot 2025-07-18 at 4 30 30 PM

Breaking Changes

The tool name read_file is now read_text_file and its description is updated to reflect that it only works with text files. This should not present an issue to clients, since it will make the decision of when to use the tool clearer.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Protocol Documentation
  • My changes follows MCP security best practices
  • I have updated the server's README accordingly
  • I have tested this with an LLM client
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have documented all environment variables and configuration options

Additional context

Error when reading large files

  • Error is Maximum call stack size exceeded
  • Currently, this tool reveals an issue that exists, I believe, in the SDK, when the file is larger than a certain size.
  • When attempting to read an example file called des.png that is 4mb in size:

Calling the tool for a large file with Inspector UI

Screenshot 2025-07-18 at 4 53 18 PM

Calling the tool for a large file with Inspector CLI

Screenshot 2025-07-18 at 4 51 40 PM

Process of elimination:

It isn't the function

  • Extracting the function that reads the file into a simple test script shows that the function has no issue reading and returning the large file, so the error is not coming from that function.
Screenshot 2025-07-18 at 4 57 01 PM

It isn't the server

  • The server tool that invokes it does nothing but return the result, so it cannot be the server.
Screenshot 2025-07-18 at 5 00 26 PM

It isn't the Inspector

  • Both the Inspector UI and CLI return the same error, but are using completely different code to invoke the tool. Neither are doing any recursive post-processing with the result that would lead to a stack overflow.
Inspector UI
Screenshot 2025-07-18 at 5 13 55 PM
Inspector CLI
Screenshot 2025-07-18 at 5 05 09 PM

Found culprit in SDK

  • Line 607 of Protocol.ts is executed when the response is received. It parses the resultSchema.
Screenshot 2025-07-18 at 5 29 10 PM
  • By replacing
    • result = resultSchema.parse(response.result); with
    • const result = response.result;

The error no longer occurs with the large file:
Screenshot 2025-07-18 at 6 01 39 PM

  • I explored lots of ways to make the compiled schema less massive using z.lazy() and z.discriminatedUnion() but in the end that didn't fix it.

  • The actual culprit turned out to be Zod's .base64() validation.

  • Under the hood, z.string().base64() uses a regular expression to validate the string. While this regex is fine for typical inputs, running it against a string that is several megabytes long can cause the JavaScript engine's regex parser to hit its internal recursion limit, resulting in a "Maximum call stack size exceeded" error.

Problem solved

  • What was needed was a more robust base64 checker in the SDK. I added one and it fixed the problem. I've created a PR that fixes the problem.

  • In the Inspector CLI and UI, tested the large file that was causing the "Maximum call stack size exceeded" error. It is no longer present and the 4mb file can be processed.

Inspector CLI

Screenshot 2025-07-18 at 8 28 42 PM

Inspector UI

Screenshot 2025-07-18 at 8 27 55 PM

cliffhall and others added 6 commits July 18, 2025 13:42
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ext_file-and-add-read_media_file

Add read_media_file tool and rename read_file
cliffhall added a commit to cliffhall/mcp-typescript-sdk that referenced this pull request Jul 19, 2025
…obResourceContent.

* This fixes an issue found in the servers repo with this PR: modelcontextprotocol/servers#2382

* Under the hood, z.string().base64() uses a regular expression to validate the string.

* While this regex is fine for typical inputs, running it against a string that is several megabytes long can cause the JavaScript engine's regex parser to hit its internal recursion limit, resulting in a "Maximum call stack size exceeded" error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant