New Content Type for "UI" #287

kentcdodds · 2025-04-09T19:38:13Z

kentcdodds
Apr 9, 2025

Pre-submission Checklist

I have verified this would not be more appropriate as a feature request in a specific repository
I have searched existing discussions to avoid duplicates

Your Idea

There are some cases where text input and output is a great user experience, and other times the user may want to interact with a generated UI with regular buttons that they're used to (for example, a simple stopwatch), or even a dynamic component like a map or graph.

Making it so tools can respond with this kind of UI would increase the utility of the MCP spec.

Desired Response (updated in the spirit of modelcontextprotocol/modelcontextprotocol#180):

{
  "mimeType": "text/html",
  "data": "<marquee>Look! UI!</marquee>"
}

There are some significant security concerns here for clients that don't render this in an isolated environment like an iframe (or potentially a realm in the future). There could be potentially a reason to only support iframe elements to enforce that isolation (or even a response type of { "type": "iframe", "src": "..." }) . But I think that it would be better if clients would manage this isolation automatically.

I also think that a relatively powerful model for this would be React Server Components ({ "type": "rsc", "data": "..." }), but we can leave that discussion for another time. Just starting with HTML would be pretty powerful, even if it was limited to iframe only.

UPDATE: More on this here https://www.epicai.pro/the-future-of-ai-interaction-beyond-just-text-w22ps

Scope

atesgoral · 2025-04-09T22:58:46Z

atesgoral
Apr 9, 2025

Hey @kentcdodds while we're at it, we should probably drop type entirely and just use mimeType: "text/html"

Also see: modelcontextprotocol/modelcontextprotocol#180

1 reply

kentcdodds Apr 10, 2025
Author

Agreed. Updated!

juharris · 2025-04-10T14:56:34Z

juharris
Apr 10, 2025

Good idea. Indeed there are security concerns with using HTML. There are several frameworks to standardize declaring UI components. Perhaps we could encourage these as custom options for that mimeType and focus on declaring what should be rendered, but now how.

At Microsoft when working on Microsoft Copilot for consumers and enterprise users, we used Adaptive Cards and that worked great. The new consumer version doesn't use them anymore because they're not needed, but the M365 (Office) Copilot still uses them. Here are some more details that I wrote about it and how standardizing on Adaptive Cards helped us scale to many UIs and types of clients: https://devblogs.microsoft.com/dotnet/building-ai-powered-bing-chat-with-signalr-and-other-open-source-tools/

There's also Fluid Framework.

2 replies

andris1 May 14, 2025

Indeed there are security concerns with using HTML.

I would treat the chat client the same way we treat browsers today - we send a bunch of html & js their way and it's up to the client, not the protocol, to decide what of that to allow/disallow.

gavindoughtie May 19, 2025

Since authors aren't necessarily writing these, don't you think the default should be the browser's native component model?

BLamy · 2025-04-12T04:37:12Z

BLamy
Apr 12, 2025

I would like this to be one of several different capabilities that the server can publish on initialization. I would almost like to re-imagine what a VSCode extension would look like in a bolt.new or chef like environment and have MCP be able to emit 'capabilities' that clients can choose to support.

Here are some thoughts from Claude that I think are good.

Model Context Protocol UI Extension (MCP-UI)

1. Abstract

This RFC proposes an extension to the Model Context Protocol (MCP) to standardize the integration of rich UI components in chat-based AI applications. The MCP-UI Extension enables MCP servers to expose interactive UI elements ranging from simple cards to full-page applications, along with a hook system to support rich interaction lifecycles.

2. Introduction and Motivation

2.1 Background

The Model Context Protocol (MCP) has successfully established a standard for AI models to access context, tools, and resources. However, the current specification focuses on textual or basic content types, with limited support for rich interactive experiences.

Applications like Convex's Chef and Stackblitz's Bolt.new demonstrate the power of combining LLM interactions with rich UIs powered by WebContainers. These experiences remain application-specific, without a standard protocol for interoperability.

2.2 Problem Statement

Developers building chat applications with AI models encounter several challenges:

No standardized way to render rich UI components alongside text responses
Limited ability to present complex interfaces for specialized tasks
No consistent lifecycle hooks for managing AI-generated content
Lack of standards for embedding interactive tools in chat flows

2.3 Goals

This extension aims to:

Define a protocol for MCP servers to expose and render UI components
Support both inline cards and full-page views
Establish a framework-agnostic approach with specific optimizations for popular frameworks
Provide a hook system for validation, transformation, and interaction lifecycle events
Maintain compatibility with existing MCP implementations

3. Terminology

Card: A compact UI component displayed inline within a chat message thread
Full View: A comprehensive UI that may replace or overlay the main chat interface temporarily
Hook: A specialized function executed at specific points in the interaction lifecycle
Component Registry: A server-defined collection of available UI components
UI State: The data model representing the current state of a UI component

4. Specification

4.1 Capability Negotiation

MCP servers that support UI components MUST declare relevant capabilities during initialization:

{
  "capabilities": {
    "ui": {
      "components": true,       // Support for UI components
      "fullView": true,         // Support for expanded view components
      "hooks": true,            // Support for lifecycle hooks
      "frameworksSupported": ["html", "react", "vue"]  // Optional frameworks list
    }
  }
}

Clients supporting the UI extension MUST include UI capabilities in their initialization request:

{
  "capabilities": {
    "ui": {
      "formats": ["html", "react"],  // UI formats the client can render
      "hooksEnabled": true           // Client supports hooks execution
    }
  }
}

4.2 Component Registry

4.2.1 Listing Components

Clients can discover available UI components by sending a ui/listComponents request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "ui/listComponents"
}

Servers respond with:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "components": [
      {
        "id": "data-chart",
        "name": "Data Visualization Chart",
        "description": "Interactive visualization of tabular data",
        "type": "card",
        "supportedFormats": ["html", "react"],
        "triggers": ["tools/call:data-analyze"],
        "inputSchema": {
          "type": "object",
          "properties": {
            "data": {
              "type": "array",
              "description": "Data points to visualize"
            },
            "chartType": {
              "type": "string",
              "enum": ["bar", "line", "pie"]
            }
          }
        }
      },
      {
        "id": "code-editor",
        "name": "Code Editor",
        "description": "Full-featured code editor with syntax highlighting",
        "type": "fullView",
        "supportedFormats": ["html", "react"],
        "triggers": ["tools/call:edit-code"],
        "inputSchema": {
          "type": "object",
          "properties": {
            "language": {
              "type": "string"
            },
            "code": {
              "type": "string"
            },
            "readOnly": {
              "type": "boolean"
            }
          }
        }
      }
    ]
  }
}

4.2.2 Component Change Notifications

Servers SHOULD notify clients when the component registry changes:

{
  "jsonrpc": "2.0",
  "method": "notifications/ui/components_changed"
}

4.3 Rendering Components

4.3.1 Request to Render a Component

Clients request component rendering with:

{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "ui/renderComponent",
  "params": {
    "id": "data-chart",
    "format": "react",  // Requested format from supportedFormats
    "data": {
      "data": [
        {"month": "Jan", "value": 10},
        {"month": "Feb", "value": 15},
        {"month": "Mar", "value": 8}
      ],
      "chartType": "bar"
    },
    "sessionId": "ui-session-123"  // Optional client-generated session ID
  }
}

4.3.2 Component Rendering Response

For HTML format:

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "format": "html",
    "content": {
      "html": "<div class=\"chart-container\">...</div>",
      "scripts": [
        "https://cdn.example.com/charts.js",
        "const chart = new Chart(...);"
      ],
      "styles": [
        ".chart-container { height: 300px; }",
        "https://cdn.example.com/charts.css"
      ]
    },
    "metadata": {
      "height": 300,
      "width": "100%",
      "interactivity": "high"
    },
    "sessionId": "ui-session-123"
  }
}

For React format:

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "format": "react",
    "content": {
      "module": "https://cdn.example.com/components/DataChart.js",
      "props": {
        "data": [...],
        "chartType": "bar",
        "onDataPointClick": {"__handler": "dataPointSelected"}
      }
    },
    "metadata": {
      "height": 300,
      "width": "100%",
      "interactivity": "high"
    },
    "sessionId": "ui-session-123"
  }
}

For ESM module format (framework agnostic):

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "format": "module",
    "content": {
      "main": "https://cdn.example.com/components/chart-component.js",
      "dependencies": [
        "https://cdn.example.com/components/utils.js"
      ],
      "props": {
        "data": [...],
        "chartType": "bar"
      }
    },
    "metadata": {
      "height": 300,
      "width": "100%"
    },
    "sessionId": "ui-session-123"
  }
}

4.3.3 Component Embedding

The UI components can be embedded in two ways:

Card: Embedded inline within the chat message flow, with restricted dimensions
Full View: Displayed as a separate view, potentially replacing the main chat interface temporarily

The client determines the rendering approach based on:

The component's declared type in the component registry
The current context and user preferences
Available screen space

4.4 Component Interaction

4.4.1 Event Handling

Components can define event handlers that trigger specific actions:

{
  "jsonrpc": "2.0",
  "method": "ui/componentEvent",
  "params": {
    "sessionId": "ui-session-123",
    "componentId": "data-chart",
    "event": "dataPointSelected",
    "data": {
      "point": {"month": "Feb", "value": 15}
    }
  }
}

4.4.2 State Updates

Servers can push state updates to rendered components:

{
  "jsonrpc": "2.0",
  "method": "notifications/ui/stateUpdate",
  "params": {
    "sessionId": "ui-session-123",
    "updates": {
      "selectedPoint": {"month": "Feb", "value": 15},
      "highlightedSeries": "revenue"
    }
  }
}

4.4.3 Form Submission

Components can submit form data through a standardized method:

{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "ui/submitForm",
  "params": {
    "sessionId": "ui-session-123",
    "formId": "config-form",
    "data": {
      "chartType": "line",
      "timeRange": "last-30-days",
      "metrics": ["revenue", "users"]
    }
  }
}

4.5 Lifecycle Hooks

Hooks provide integration points at critical moments in the chat interaction flow.

4.5.1 Hook Types

preGen: Executed before sending content to an LLM
postGen: Executed after receiving content from an LLM, before displaying to the user
lint: Validates or formats content
test: Verifies correctness or performs automated tests
userAccepted: Executed when a user accepts a suggestion or change

4.5.2 Hook Registration

Servers register hooks through the component registry extension:

{
  "jsonrpc": "2.0",
  "id": 4,
  "method": "ui/listHooks"
}

Response:

{
  "jsonrpc": "2.0",
  "id": 4,
  "result": {
    "hooks": [
      {
        "id": "code-formatter",
        "type": "lint",
        "description": "Formats code according to style guidelines",
        "triggers": ["postGen"],
        "inputSchema": {
          "type": "object",
          "properties": {
            "language": { "type": "string" },
            "code": { "type": "string" }
          }
        }
      },
      {
        "id": "dependency-validator",
        "type": "test",
        "description": "Validates that all dependencies are properly defined",
        "triggers": ["userAccepted"],
        "inputSchema": {
          "type": "object",
          "properties": {
            "packageJson": { "type": "string" },
            "imports": { "type": "array" }
          }
        }
      }
    ]
  }
}

4.5.3 Hook Execution

Clients execute hooks by calling a specialized tool:

{
  "jsonrpc": "2.0",
  "id": 5,
  "method": "tools/call",
  "params": {
    "name": "hook:code-formatter",
    "arguments": {
      "language": "javascript",
      "code": "function hello() {console.log('world')}"
    }
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 5,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "function hello() {\n  console.log('world');\n}"
      }
    ],
    "metadata": {
      "lintChanges": 2,
      "issues": []
    }
  }
}

4.6 WebContainer Integration

The MCP-UI extension is designed to work seamlessly with WebContainer-based environments:

ESM Modules: Components can be loaded from ESM modules served directly from the WebContainer
State Persistence: Component state can be persisted in WebContainer filesystem
Server-Side Processing: Complex logic can run directly in the WebContainer

Example WebContainer integration:

{
  "jsonrpc": "2.0",
  "id": 6,
  "method": "ui/renderComponent",
  "params": {
    "id": "react-editor",
    "format": "react",
    "data": {
      "projectPath": "/workspace/my-project",
      "openFile": "src/App.js"
    },
    "webContainerOptions": {
      "serverPort": 3000,
      "filesystemAccess": ["read", "write"],
      "mountPoints": ["/workspace"]
    }
  }
}

5. Security Considerations

5.1 Content Security Policy

Clients MUST implement appropriate Content Security Policies when rendering arbitrary HTML and JavaScript from MCP servers. This includes:

Restricting script sources to trusted domains
Setting sandbox attributes on iframes
Implementing strict CSP headers for all rendered content

5.2 User Consent

Clients SHOULD obtain user consent before:

Loading external resources
Running scripts from MCP servers
Granting filesystem access to WebContainer components

5.3 Data Privacy

Components SHOULD:

Clearly communicate what data they collect
Provide options to limit data sharing
Respect user privacy preferences

5.4 Resource Limitations

Clients MUST implement resource limitations to prevent abuse:

Memory limits for WebContainer instances
CPU usage caps
Network request quotas
Maximum component instance count

6. Examples

6.1 Simple Chart Card

User: Show me a chart of the last 6 months of revenue

Assistant's Process:

LLM decides to use a chart component
Calls data-analytics tool to retrieve revenue data
Renders data-chart card component with the data

Implementation:

// Tool call
{
  "method": "tools/call",
  "params": {
    "name": "data-analytics",
    "arguments": {
      "metric": "revenue",
      "period": "6-months"
    }
  }
}

// Tool response
{
  "content": [
    {
      "type": "text",
      "text": "I've analyzed the last 6 months of revenue data."
    }
  ],
  "data": {
    "revenue": [
      {"month": "Nov", "value": 120000},
      {"month": "Dec", "value": 150000},
      {"month": "Jan", "value": 130000},
      {"month": "Feb", "value": 160000},
      {"month": "Mar", "value": 140000},
      {"month": "Apr", "value": 180000}
    ]
  }
}

// UI component rendering
{
  "method": "ui/renderComponent",
  "params": {
    "id": "data-chart",
    "format": "react",
    "data": {
      "data": [response.data.revenue],
      "chartType": "line",
      "title": "Monthly Revenue (Last 6 Months)"
    }
  }
}

6.2 Code Editor Full View

User: Help me refactor my React component to use hooks

Assistant's Process:

LLM analyzes the request
Calls code-editor full view component
Loads user's code into the editor
Provides interactive refactoring assistance

Implementation:

// UI component rendering
{
  "method": "ui/renderComponent",
  "params": {
    "id": "code-editor",
    "format": "react",
    "data": {
      "language": "javascript",
      "code": "class MyComponent extends React.Component { ... }",
      "action": "refactor",
      "refactorType": "convert-to-hooks"
    },
    "type": "fullView"
  }
}

6.3 Lint Hook Example

User: Generate a React component that fetches data from an API

Assistant's Process:

LLM generates component code
Before displaying, the code-formatter hook runs
The hook formats the code according to style guidelines
The formatted code is displayed to the user

Implementation:

// Generate code using LLM

// Before displaying, run the code-formatter hook
{
  "method": "tools/call",
  "params": {
    "name": "hook:code-formatter",
    "arguments": {
      "language": "javascript",
      "code": "function DataFetcher(){const[data,setData]=useState(null);useEffect(()=>{fetch('/api/data').then(r=>r.json()).then(setData)},[]);return <div>{data?JSON.stringify(data):'Loading...'}</div>}"
    }
  }
}

// Formatted result displayed to user
{
  "content": [
    {
      "type": "text",
      "text": "```jsx\nfunction DataFetcher() {\n  const [data, setData] = useState(null);\n\n  useEffect(() => {\n    fetch('/api/data')\n      .then(r => r.json())\n      .then(setData);\n  }, []);\n\n  return <div>{data ? JSON.stringify(data) : 'Loading...'}</div>;\n}\n```"
    }
  ]
}

7. Implementation Considerations

7.1 Compatibility Layer

For compatibility with existing MCP implementations, the UI extension can be introduced incrementally:

Clients that don't support UI components gracefully degrade to text-only interactions
Servers that don't support UI components are treated as text-only servers

7.2 Framework Support

The protocol is designed to be framework-agnostic but includes optimizations for popular frameworks:

HTML: Raw HTML, CSS, and JavaScript for universal support
React: Optimized for React components with prop passing
Vue: Support for Vue component format
Svelte: Support for Svelte components
Custom/ESM: Framework-agnostic ESM modules

7.3 Progressive Enhancement

Implementation should follow progressive enhancement principles:

Basic content always works even without UI support
UI features enhance the experience when available
Components gracefully degrade when specific features aren't supported

8. Adoption Roadmap

8.1 Phase 1: Draft Specification (Current)

Publish draft specification for community feedback
Develop reference implementations for core functionality
Validate with selected partners

8.2 Phase 2: Implementation and Testing

Release developer preview SDK
Create demonstration applications
Gather feedback and refine specification

8.3 Phase 3: Standardization

Finalize specification based on implementation experience
Publish formal MCP extension
Release production-ready SDKs

9. References

Model Context Protocol Specification: https://modelcontextprotocol.io/specification/
WebContainer API Documentation: https://webcontainers.io/
Content Security Policy: https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP

10. Acknowledgments

This proposal was inspired by innovative applications like Convex's Chef and Stackblitz's Bolt.new, as well as feedback from the broader AI developer community seeking to create more interactive and productive AI-assisted experiences.

11. Appendix

11.1 Schema Definitions

See accompanying JSON Schema documents for formal definitions of all protocol extensions.

11.2 WebContainer Configuration Options

Option	Description	Default
`serverPort`	Port to expose from WebContainer	`3000`
`filesystemAccess`	Type of filesystem access	`["read"]`
`mountPoints`	Filesystem paths to mount	`["/"]`
`serverCommand`	Command to start dev server	`"npm start"`
`memoryLimit`	Memory limit in MB	`512`

For Instance, in the screenshot I would imagine that the MCP server for convex emitted that it could render a full app. And Chef decided to render that as a new tab in it's bolt.new like interface.

1 reply

kentcdodds Apr 12, 2025
Author

Interesting idea. I think that what you just posted should probably be in its own discussion. It's a much bigger thing than what I'm asking for here.

sean-roberts · 2025-04-14T21:25:25Z

sean-roberts
Apr 14, 2025

I've been thinking about this too. What I've been imagining is more design systems/tokens based + generative UI.

Provide context about brand, voice, tone, etc. context. Then specify display expectations for the UI. Then let the LLM produce the UI in the systems it has available.

{
  capabilities: {
    ui: {
       tokens: {
           //... https://tr.designtokens.org/format/
       }
    }
  }
}

Then return resources can also specify preferred display expectations.

{
  //...
  ui: "A table with the top rows that use the primary branding font."
}

What's led me to this thinking is trying to find a way that doesn't assume the capabilities of an agent or what it's using this information for. If it's going to use this for displaying, it can decide if it can use this information to produce the necessary UI based on what it has available to it. If it's going to use this information for doing more data processing, then it won't need to concern itself with the UI parts.

Just throwing HTML into an arbitrary isolated webview with lots of things turned off is certainly the easiest route but this route might offer some really interesting mixed experiences with what the agent branding is and the upstream data/mcp

5 replies

kentcdodds Apr 14, 2025
Author

This is interesting. How do you think it would represent more complex UI components, like a map (ie: "show me directions to the nearest taco stand")?

sean-roberts Apr 14, 2025

It depends on the level you're asking about. If this is person -> agent: "show me directions to the nearest taco stand." The agent that the person is interacting with is in charge of how to display this information. It will need to have its support for various different components and such. If this was person -> agent: "show me directions to nearest taco stand." -> MCP tool call for restaurant finder then I would expect the restaurant finder to have already provided the design system/tokens. Then it would return the necessary structured data for a map coordinates and/or address information with any additional instructions it might want to have a branded experience.

That example might be a little odd as I'd expect one of two things: 1) agent can display maps (and can thus probably generate brandable sections) or 2) they provide an address that's linkable to a map service like google maps.

If I think of other experiences that might be complex (in the sense that there can be lots of components with different needs).. I can imagine a shopping cart experience with delegated UI. My agent finds a shop that I'd like to shop from (or many). Using their MCPs they can delegate information about branding, imagery, etc. that I can then present. So site logo, call to action buttons, etc. can all look and feel similar to what you'd expect from the site. If I'm presenting more than one site's products, the agent can normalize it and use distinct tokens that allow for lightly branded experiences (colors, etc.). If this is a specific store/product experience, it can go much further in what/how it shows content to users based on the available context (sizes, animation, etc.). Ultimately, again, I think the agent will determine when/how the UI based on what it's doing with it. Like, what are the truly imporant parts that one needs to describe to make up a branded experience and can those be codified in a way that allows any agent to do this.

There's security/fraud considerations abound here. But I think we will need more cues that don't exist currently to ensure these things are legit regardless of mechanism. What's the identification and "green lock" in this world?

Admittedly, I'm still thinking through this but I feel like, directionally, this is what true generative UI is vs function calling selecting internal components on a site.

tadasant Apr 21, 2025
Collaborator

I think this direction is interesting, however one hesitation I have with it (versus just directly returning HTML) is that we lose the possibility of instantaneous rendering. One of the biggest pain points I have with, say, Claude Artifacts' flow is how long it takes to render a complex UI. It's a major blocker on having it be useful in the kind of workflows I imagine this proposal to enable. The idea of non-generative strict templating with HTML on the server side solves the speed problem.

Secondarily / related: if I'm setting up some route admin-type workflow that benefits from having a UI, I'd probably want the UI to be quite deterministic. It would be potentially undesirable to get a slightly different UI every time I go through the workflow.

Not to say I don't think there is room to consider this generative UI approach, but it may be worth a separate exploration. IMO it will be an easier path to get something like this HTML content type added, and if it becomes a popular use case with ecosystem adoption, expanding it to include design tokens and generative rendering could be a way to level it up.

tadasant Apr 21, 2025
Collaborator

One more additional concern with the generative UI approach: that starts to feel like a predominantly host-app concern. The idea of design tokens for branding and stylistic purposes is interesting, but ultimately the functionality to generate a UI falls to the host app, which then begs the question: why aren't more host apps today already taking e.g. free form TextContent + ImageContent responses and pulling them together to generate helpful UI's to their users? I'd posit that it's at least partially the "speed" problem above (though it's possible that simply the fact that there isn't enough context to "hint" to the LLM+host that generating a UI would be appropriate is enough of a blocker).

This is in contrast with the original proposal, where structuring the UI is squarely a server-side concern, which I think unlocks more interesting possibilities that altogether aren't possible today.

drewcapener May 15, 2025

I think @sean-roberts has made some interesting points here. I especially think it is important to think about how to support the two possibilities- one where the agent wants to render something for a user, and another where the agent just wants the data for further processing. As it relates to showing things to the user, I don't think generating a UI makes a lot of sense. To me, a huge reason for doing this is to inject branding capabilities into MCP where there is no branding at all right now. I actually think this is vital for the success of the whole MCP project because it gets to the motivations of server implementers... If I am a company and I have no way to associate my brand with the value a user is getting from my MCP server, why would I ever invest in moving my customers in that direction? I think everybody here understands that well enough, but what I'm getting to is that I don't think brands want a generated UI... they want to tightly dictate what the user sees and the experience the user has. All that to say, I'm a nobody but my vote is for deterministic UI.

tadasant · 2025-04-18T20:28:21Z

tadasant
Apr 18, 2025
Collaborator

I do really like the idea of UI being a possible content type as a response from a tool call.

Wanted to flag that this is probably related and has a new PR open. It probably doesn't make sense to try to get the concept of UI in that PR, but would likely build on top of that work.

6 replies

tadasant Apr 19, 2025
Collaborator

I like your examples for your cases, but I also think there are similar opportunities for tool_call_response's to be useful. For example, a tool call to fetch a list of contacts with certain filters applied from my CRM would be a nightmare to try and display as text to the user; it'd be great to render a list of "contact cards" (or even better: a simple interactive UI where the user could then toggle some filters or run some client side searches). Even if the LLM workflow goes on to do something else with the contacts, the end-user might want to go back to the response of this tool call, "click into" the results, and play with the data to assess whether the tool call returned what they expected.

kentcdodds Apr 19, 2025
Author

With this in mind, having a set of UI components or tokens as @sean-roberts suggests seems like the best approach here. The LLM can put these things together in different ways depending on the needs of the user.

For example:

User queries contacts named "John"
MCP responds with 30 matches
LLM puts together a UI of those matches as a clickable list with the profile picture and name
User clicks on "John Smith" for more details
LLM generates more UI to display the details

The UI generated can be branded to the service if provided with the proper components and/or tokens from the MCP server metadata and/or tool response.

This is in contrast to the current possible experience:

User queries contacts named "John"
MCP responds with 30 matches
LLM lists out the matches as text (maybe the host application supports the image content type so it's not so bad and we can display profile pictures too)
User types "please give me details on John Smith" <--- this is a significantly poorer UX
LLM lists out more details as text

I think having something in the spec for clients to reliably generate UI when appropriate would be quite useful.

BLamy Apr 19, 2025

@tadasant if you were returning a list of contacts then you should be returning a list of resources with the mime-type as vCards and the UI would know how to visualize vCard resources.

So in that case you wouldn't need to visualize the response using a custom render. It would just visualize the resources that were returned based on the mime-type.

So given this you still only need components which match then JSON schema of your tool calls (for confirmation cards) and components which visualize resources (for rich attachments and responses)

tadasant Apr 21, 2025
Collaborator

@tadasant if you were returning a list of contacts then you should be returning a list of resources with the mime-type as vCards and the UI would know how to visualize vCard resources.

I don't think this is necessarily true. I think tool calls can return data that is much more ephemeral than a Resource, and in many cases doesn't stand alone (and might have context relevant to that specific tool call). For example, say the tool call was to get_restaurants_near_location that returned a list of restaurants with their distance from a particular lat/long. That distance per-restaurant is specific to that tool call, and cannot be modeled in a standalone Resource.

With this in mind, having a set of UI components or tokens as @sean-roberts suggests seems like the best approach here.

I added some thoughts on the generative UI approach above -- not totally on sold on that, but I would agree with the before/after examples you gave as something this proposal (in its original, non-generative form) does help solve.

One interesting dynamic I am observing as we talk through use cases is the idea that the user will often want to interact with the UI to continue the workflow. i.e.

User clicks on "John Smith" for more details

I think it'd be worthwhile to think through how to make this HTML content type "interactive", such that a button click can basically trigger something like a Prompt to be invoked by the LLM. To be more specific, take a use case like interacting with a Quickbooks MCP server to process your business's invoices:

get_pending_invoices tool call to Quickbooks MCP server
Returns HTML with all the invoices, some metadata, and a button to "Approve" and a button to "Decline"
Then, when the user clicks "Approve", that does something like runs the server's Prompt that is set up to invoke tool approve_invoice_id

BLamy Apr 21, 2025

Why can't the per-distance not be represented in a resource? Resources follow RFC 6570 which means you can pass query params into them.

/locations/{id}{?lat,lon}
/locations/123?lat=37.7749&lon=-122.4194

Would load the resource passing in the users lat long so that it can calculate the distance. You can go full oData if you wanted to support filtering and things like that allowing your resources to represent collections.

There are also GIS datatypes that would make for a good mime type such as geojson

evalstate · 2025-04-20T15:16:28Z

evalstate
Apr 20, 2025
Collaborator

Writing to also flag this PR and this comment on the use of EmbeddedResource.

The spec already supports delivery of text/html content using this data type

I'd normally expect the provision of dynamic UI rendering to be a Host application feature. Is the proposal to require Hosts to support certain UX primitives?

(Removed comment on Claude Artifacts, it's the other way round :) ).

3 replies

tadasant Apr 21, 2025
Collaborator

Good flag on that PR; certainly relevant w.r.t. advertising support for this content type (if we were to add it).

The spec already supports delivery of text/html content using this data type

Maybe I'm misunderstanding the scope of the Resource and EmbeddedResource types, but my impression of them is that they are not meant to represent ephemeral / arbitrary data. i.e. Anything you call a Resource or EmbeddedResource would be included in a list/resources call that happens at the same point in time.

The Model Context Protocol (MCP) provides a standardized way for servers to expose resources to clients. Resources allow servers to share data that provides context to language models, such as files, database schemas, or application-specific information. Each resource is uniquely identified by a URI. (spec)

Indeed, I'm not sure how to think about a URI if we're using Resource to represent some point in time data structure.

BLamy Apr 21, 2025

Anything you call a Resource or EmbeddedResource would be included in a list/resources call that happens at the same point in time.

Is there a reason you think it needs to work this way?

tadasant Apr 22, 2025
Collaborator

Anything you call a Resource or EmbeddedResource would be included in a list/resources call that happens at the same point in time.

Is there a reason you think it needs to work this way?

You're right, I'm probably wrong here. There's nothing explicitly in the spec that enforces that, so that was just my own misinterpretation. I like your and @evalstate's thinking better as it makes Resources more flexible and allows us to support ideas like the one that kicked off this thread without new work on the spec.

evalstate · 2025-04-21T10:29:16Z

evalstate
Apr 21, 2025
Collaborator

Sorry, my previous answer was unhelpful - I was in spec review mode!

Working through a couple of example workflows:

LLM calls "generate ui" tool, and the MCP Server responds with a CallToolResult containing a TextEmbeddedResource with the mimeType of application/html and URI of ui-generator://custom-ui/version-{number}.

The Host would then recognise the URI scheme as a custom Resource Template that required UI rendering and display it. It MAY choose to subscribe to updates if the Server supports it. It's also fine for a URI to be disposable.

A Host application that did this would be really really cool, as you could use the generated ui to send new messages to the LLM etc. It could also for example update based on new version numbers of the custom-ui being made available by the MPC Server.

Sophisticated Client/Server pairs may choose to use Sampling for this kind of UI generation if appropriate.

Again, there's nothing stopping the promotion of a URI scheme and commonly accepted practices for Hosts to do this kind of rendering.

Another alternative would be to take the mpc-webcam approach. There's no reason why an MCP Server can't show it own UI, or generate it's UI via Sampling requests. Since those features are in mcp-webcam, forking it for a POC should be straightforward (sadly, time doesn't permit me to do this right now).

There is a little guidance on some specific URI types here that is worth reviewing.

It opens up implementation questions like
"If this is for display only, do you want the HTML in the Context Window?" - in which case exposing ui-generator://custom-ui/version-{number} only as a Resource rather than a Tool Result might be more appropriate.

So I think the specification enables this kind of workflow at the moment, would be interested in learning more on the req's - happy to assist in any POCs.

10 replies

tadasant Apr 22, 2025
Collaborator

That's fair, I had a chat with @evalstate and he convinced me this is a valid use of Resources.

My position now is that the original proposal can be accomplished using EmbeddedResource (no change to the spec needed), and there is an opportunity for someone to prove it out, demo it, and perhaps ultimately contribute some patterns/docs to the official MCP docs. Guessing the POC will be verbose, so there's probably also an opportunity to build community-authored librar(ies) to extend the official SDK's or get into the SDK's themselves.

patwhite Apr 24, 2025

@evalstate ya, in the content type discussion we had I got hung up on this is as well - here's the exact wording from the spec:

Embedded Resources
Resources MAY be embedded, to provide additional context or data, behind a URI that can be subscribed to or fetched again by the client later:

I would say this wording implies that the URI must be non-ephemeral. Maybe we can change this to make it more clear the URI is optional on the embedded resource, or the server MAY expose embedded resource uris

evalstate Apr 25, 2025
Collaborator

Agreed - I'll explain the reasoning behind my position:

From the URI schemes section: implementations are always free to use additional, custom URI schemes.. This gives the implementor freedom to design and the guidance from the earlier linked RFC 3986 says: There is no guarantee that once a URI has been used to retrieve information, the same information will be retrievable by that URI in the future. Nor is there any guarantee that the information retrievable via that URI in the future will be observably similar to that retrieved in the past. . This means that it is perfectly acceptable to define my-mcp-server://ephemeral/{uuid} with no later guarantees.
Building on that, Resource Subscriptions and Templates are built with the expectation that the underlying content will change - reversing any semantics that a Resource would be permanent. For example a CRM URI that had mcp-crm://customers/1234 would return different content based on other CRUD style updates that happened between reads (including deletions).
Finally, there is no [stated] Requirement that a Server which returns an EmbeddedResource must advertise the resources capability nor any SDK implementation that checkes/enforces that.

I think it is worth updating that text as this could be inhibiting the take-up of one of the more powerful aspects of MCP.

jonathanhefner Apr 26, 2025

Coming to this thread from modelcontextprotocol/modelcontextprotocol#356 (comment). I agree it would be nice to have more guidance in the spec about the usage of EmbeddedResource. I was searching for a dedicated thread and found @evalstate's modelcontextprotocol/modelcontextprotocol#90, which mentions a different use case, but is highly related to the role / interpretation of EmbeddedResource.

I also agree that URIs should be allowed to be ephemeral. Specifically, I would like to see URIs that are scoped to a live context window. So you might have multiple items in a tool result content array, some of which are EmbeddedResource with associated ephemeral URIs, and other items in the content array could refer to those URIs. Or you might have such EmbeddedResource items already present in the current context, and a prompt could refer to those URIs, potentially including the EmbeddedResource content in a tool call arg.

tadasant Apr 26, 2025
Collaborator

@jonathanhefner - @evalstate just opened a PR modelcontextprotocol/modelcontextprotocol#415 :)

gunta · 2025-04-21T16:51:32Z

gunta
Apr 21, 2025

I like this proposal, although I think this issue is more on the HTMX side rather than on the React/Vue/Web Components/React Native side.
I think both have their uses, this one focusing on security and simplicity.

For my requirements, I am more on https://github.com/BLamy side, envisioning making something like https://github.com/21st-dev/magic-mcp be done in a standard way.
Ie. providing the data and how to render it straight to the client, based on the capabilities of both server and client.

So I created a separate issue here for the components way:
https://github.com/orgs/modelcontextprotocol/discussions/320

Let me know if it makes sense to merge here, or keep it separated.

2 replies

evalstate Apr 21, 2025
Collaborator

I guess the question for me would be is this a product you're building on top of MCP (which is the kind of innovation MCP can drive) - or a need to change the specification to meet a goal. Bear in mind from the Specification front page:

Whether you’re building an AI-powered IDE, enhancing a chat interface, or creating custom AI workflows, MCP provides a standardized way to connect LLMs with the context they need.

It's quite possible that people will be using MCP in quite different environments for which the requirement (in the specification) to support a Rich UI wouldn't make sense.

gunta Apr 22, 2025

I agree that it wouldn't make sense, but there should be a way to clients/servers that DO support such rendering, have a standard way to communicate its way.

In its simple form it would be like Accept headers, in its complex form like Media Capabilities API

evalstate · 2025-04-21T17:14:05Z

evalstate
Apr 21, 2025
Collaborator

@tadasant I think we may need to make this more prominent as these discussions keep popping up.

1 reply

patwhite Apr 24, 2025

Honestly, this was part of the long discussion we had around content types - I don't think it's at all obvious in the documentation that an embedded resource is just a free form data field, and by putting data in it you don't pickup any additional hosting requirements. There's probably a quick documentation fix or naming fix for this (not overloading the term resource would help, since you're sort of trained by the rest of the protocol AND by REST in general that a resource is an addressable thing.

kentcdodds · 2025-05-02T23:49:18Z

kentcdodds
May 2, 2025
Author

I wrote up a little bit more about my thoughts on why I think this is important here: https://www.epicai.pro/the-future-of-ai-interaction-beyond-just-text-w22ps

And made a video explaining it too:

future-interaction.gh.mp4

0 replies

idosal · 2025-05-16T21:20:20Z

idosal
May 16, 2025

Sharing MCP UI, an SDK that implements both the server and client side of UI over MCP. It's meant to be used as a playground to bootstrap this initiative in practice. Hopefully, the community can test ideas until we find what works best. Thoughts, changes, and contributions are obviously more than welcome.

Under the hood, the SDK enables any MCP server to respond with an Embedded Resource with a "ui://" or "ui-app://" URI. The client SDK allows hosts to render it in a supported method (currently raw HTML or external app) and handles follow-up interactions through events (e.g., tool calls). We need to find a better delivery method, perhaps RSC/remotedom/...

mcpui-x.mp4

0 replies

Mookse · 2025-05-17T22:56:11Z

Mookse
May 17, 2025

Yes - this would be fantastic! On our MyLife platform we most definitely ship UI requests (totally prototype draft in our case awaiting more focused attention by frontend enthusiasts, of which I am not) to the frontend in order for UI elements (input and other) to be constructed, altered or destroyed based on intelligence rendering on the backend.
To me, this is a critical advance of the protocol, thanks for bringing it up!!

0 replies

nilslice · 2025-05-18T20:33:45Z

nilslice
May 18, 2025

I really like this and feel like it's the direction LLM-infused software is heading. But, I don't think it's really necessary to specify, nor is it secure to return UI code that should be rendered.

Rather, clients (who already have LLMs to use) can interpret any data returned by a server with the intent to render UI. What could be specified is a signal that the server may want to send along a hint that the response could trigger a generated UI. But, that should be entirely up to the client to handle independent of what the server says.

As a client, it would not be safe to simply accept code to use for rendering from a server. Use the browser as an equivalent story here - we have frames to allow a similar experience but it's largely unused because of poor security and consistency across host-delivered UI. Instead, a browser client fetches data, and then uses it to conditionally render UI owned by the client.

I think this is a prescient idea, which we will see happen without any specification -- and don't need to specify the behavior here because it adds a lot of surface area to the protocol that ultimately is more of an emergent outcome, vs. one defined in the spec.

I'd be happy to know where there are entirely secure cases to take untrusted, executable code for rendering, or where we think today's client-side MCP implementations with access to LLMs can't already be prompted into rendering UI based on data from server responses.

1 reply

simoncollins May 19, 2025

I agree, although I'm not sure there's a need for the server to specify to the client that the response could generate a UI. Do you have an example?

It feels like all UI generation will ultimately end up in the host because the host will have an agentic loop using a model fine tuned to generate the UI dynamically from low level primitives. This is really a generalisation of what we are starting to see with Claude Artifacts or Grok's UI.

In that scenario, only data is returned from the tool calls. The host then moulds that data into whatever UI it sees fit, based on its understanding of the user’s preferences and the current task context.

Given that, I’d prefer to keep the UI out of MCP and instead allow the host to optionally specify an output schema that the data should conform to, similar to what Google’s A2A protocol allows.

gavindoughtie · 2025-05-19T19:18:39Z

gavindoughtie
May 19, 2025

Let's imagine that your MCP server is really smart and what it wants to receive from the hosting environment is a (sandboxed) UI surface that it can do anything with -- WebGL visualizations, audio, javascript execution -- the whole thing. Maybe the protocol should be inverted -- the agent simply constrains this UI surface in some way when it calls the MCP server, and the server renders into the sandbox with those constraints/preferences. As always, if I can't enable my user agent to re-render your brand's font choices as Papyrus then it's not really my agent. TL;DR maybe start with iframes?

1 reply

andris1 May 20, 2025

Very much agree with the sentiment.

liady · 2025-05-19T21:18:46Z

liady
May 19, 2025

Having the clients deciding on when and how to render UI (like today) might be enough for simple data visualization, but won't be enough for interactive UI snippets (for example choosing an option) - which mcp-ui tries to solve.
An interesting direction could be to split the render responsibility - the server will define the composition and UI logic (how to handle events), but the client will render the actual UI and will orchestrate these events.
This will allow the clients to fully control the behavior and on the other hand to maintain a consistent design.
I think Remote DOM is a very interesting solution for that - perfectly separating the two.

Remote DOM lets you take a tree of DOM elements created in a sandboxed JavaScript environment, and render them to the DOM in a different JavaScript environment. This allows you to isolate potentially-untrusted code off the main thread, but still allow that code to render a controlled set of UI elements to the main page.

Currently mcp-ui could be a good playground for exploring that, the next step will be to try and add support for this kind of remote rendering as well.

2 replies

patwhite May 20, 2025

I agree, the best example here is picking an airplane seat. There's just no way to really handle that "elegantly" without some sort of structure like this, because you need to show where bathrooms are and stuff. There's no way to tell the llm how to render that.

simoncollins May 21, 2025

Good example. Ultimately I think that can still be achieved without server rendering by having the MCP server expose resources with details about different airplane layouts, including for example an annotated SVG diagram of the plane. But it's a nice test case.

chmod777john · 2025-05-22T08:35:28Z

chmod777john
May 22, 2025

How about this
https://github.com/ag-ui-protocol/ag-ui

0 replies

liady · 2025-05-27T14:00:48Z

liady
May 27, 2025

This is an interesting POC with mcp-ui and Shopify web components (public). Very rough around the edges of course, but note the flow being affected by user interaction (including opening a tab). This should be given solely to the client's decision of course.

VID-20250526-WA0047.mp4

1 reply

kentcdodds May 27, 2025
Author

This is the way 👆

New Content Type for "UI" #287

Uh oh!

Uh oh!

Pre-submission Checklist

Your Idea

Scope

Replies: 17 comments · 36 replies

Uh oh!

Uh oh!

kentcdodds Apr 10, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

1. Abstract

2. Introduction and Motivation

2.1 Background

2.2 Problem Statement

2.3 Goals

3. Terminology

4. Specification

4.1 Capability Negotiation

4.2 Component Registry

4.2.1 Listing Components

4.2.2 Component Change Notifications

4.3 Rendering Components

4.3.1 Request to Render a Component

4.3.2 Component Rendering Response

4.3.3 Component Embedding

4.4 Component Interaction

4.4.1 Event Handling

4.4.2 State Updates

4.4.3 Form Submission

4.5 Lifecycle Hooks

4.5.1 Hook Types

4.5.2 Hook Registration

4.5.3 Hook Execution

4.6 WebContainer Integration

5. Security Considerations

5.1 Content Security Policy

5.2 User Consent

5.3 Data Privacy

5.4 Resource Limitations

6. Examples

6.1 Simple Chart Card

6.2 Code Editor Full View

6.3 Lint Hook Example

7. Implementation Considerations

7.1 Compatibility Layer

7.2 Framework Support

7.3 Progressive Enhancement

8. Adoption Roadmap

8.1 Phase 1: Draft Specification (Current)

8.2 Phase 2: Implementation and Testing

8.3 Phase 3: Standardization

9. References

10. Acknowledgments

11. Appendix

11.1 Schema Definitions

11.2 WebContainer Configuration Options

Uh oh!

kentcdodds Apr 12, 2025 Author

Uh oh!

Uh oh!

kentcdodds Apr 14, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tadasant Apr 21, 2025 Collaborator

Uh oh!

tadasant Apr 21, 2025 Collaborator

Uh oh!

Uh oh!

tadasant Apr 18, 2025 Collaborator

Uh oh!

tadasant Apr 19, 2025 Collaborator

Uh oh!

Replies: 17 comments 36 replies

kentcdodds Apr 10, 2025
Author

kentcdodds Apr 12, 2025
Author

kentcdodds Apr 14, 2025
Author

tadasant Apr 21, 2025
Collaborator

tadasant Apr 21, 2025
Collaborator

tadasant
Apr 18, 2025
Collaborator

tadasant Apr 19, 2025
Collaborator

kentcdodds Apr 19, 2025
Author