-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Is your feature request related to a problem? Please describe.
We propose introducing a new transport to the core A2A protocol, identified by the string “stdio”, that allows communicating the A2A protocol over the UNIX standard stdin and stdout streams. The proposal defines the protocol for how to pass A2A requests over stdin, and read A2A responses via stdout. The primary complication to solve for non-HTTP transports is in how to pass data that is passed via HTTP headers. To address this, we define a simple message format that is identical to the Language Server Protocol message format: a header part, followed by a message part.
Describe the solution you'd like
Stdio Protocol
Message Format
This transport specification is strongly informed by the Language Server Protocol. It is identical, except that the content of messages are A2A messages, rather than LSP messages.
Every message exchanged via stdio transport is composed of a header followed by the message content. That is:
stdio-message = header '\r\n\r\n' content # A header part, followed by \r\n\r\n, then the content part
header = header-field ['\r\n' header-field]* # One or more header fields, joined by \r\n. Header fields follow HTTP.
content = json-object # Content is ALWAYS a JSON-RPC message
where the header-field
conforms to the HTTP semantic. The content MUST be a JSON-RPC message as defined by the A2A JSON-RPC Transport.
Required Headers
Every message MUST include the Content-Length
and Content-Type
headers. This is aligned with the LSP requirements, and is logical to aid in ensuring that messages are parsed correctly.
AgentCard Representation
Stdio-based agents have a bootstrapping problem: in order to create a client to an agent, we need an AgentCard. This is because the AgentCard indicates the supported transports for the agent. However, a client for a stdio-based agent must have access to the handles for the stdin/stdout of the agent, which implies that the we’ve already started the agent and know it is a stdio agent. We resolve this by having the AgentCard encode the instructions for running the subprocess in the url
field of the AgentCard
or AgentInterface
.
The url
field of a ”stdio”
transport MUST be the command to execute to start the subprocess. Note that this breaks the assumption of the url
field containing a valid URL.
Example:
{
"name": "My Local Process Agent",
"url": "npx run my-agent@0.3.0",
"preferredTransport": "stdio",
# Rest of the AgentCard...
}
Example where stdio is an alternate transport:
{
"name": "Super Travel Agent",
"url": "https://agents.example.com/travel/v1",
"additionalInterfaces": [
{"url": "npx run super-travel-agent-5000@32.1.0", "transport": "stdio"}
]
}
With this option, the process of creating a client for a stdio-based agent is clear: spawn the agent subprocess, and connect to the stdin/stdout streams. When the client is closed, the subprocess is terminated. The lifecycle of the client and the lifecycle of the subprocess are tied together. It is expected that clients will perform an immediate fetch of the AgentCard after spawning the subprocess. This allows users to register “stubbed” AgentCards with minimal information about the agent (just the name + url fields, for example), then retrieve the full AgentCard from the running agent.
Using a command string is how many systems have incorporated running local MCP servers, so it should be familiar to both developers and clients. See Cursor, Zed, and Gemini CLI documentation.
The expectation for how local agents are published is that a (possibly stubbed) AgentCard is provided via some means (perhaps as a file in GitHub, perhaps as a static document available via HTTPS, perhaps as code block in a README for the agent). Systems that allow users to add A2A agents, such as IDEs or coding agents, will then have a unified path: adding a new agent means adding the AgentCard. This simplifies the code for integration -- no need to understand the difference between remote agents and local-process agents; leave that to the client constructor.
If custom configuration for running an agent is required (such as providing environment variables or command-line flags), the expectation is that the user constructs a modified AgentCard encoding these parameters into the url
field. For example:
SOME_ACCESS_TOKEN=$(cat secret-file.txt) uvx run secure-agent --use-special-token
FIXME: We should define the exact semantics of how a command is executed. I included a particularly complicated example above that involved using an interpolated sub-process. That might be overkill, but allows the greatest flexibility.
Option: Turn these into URLs. We could have this field still formatted as a URL (or a URI) if we defined a custom URI-scheme as a prefix, such as exec://npx run my-agent@0.3.0
. While this would allow us to not break the expectations of the AgentCard.url
field containing a URL value, I feel there’s not much value to it -- using the field requires parsing it as a URI, checking the custom scheme, then pulling out the command, whereas making the assumption that the url
field just contains the command simplifies this.
Noted potential problems:
- By pushing the subprocess execution into the A2A SDK/Client construction, we are potentially obscuring the running subprocesses from the “host” application (such as the IDE/coding agent). This may be undesirable.
- This can be alleviated with SDK support for injecting a process executor, which would allow the host framework to be aware of and track running processes.
- Similar to the above point, if the AgentCard is the owner of the details for launching the subprocess, it removes the ability of the host to create other, but still ultimately stdio-based, means of launching a subprocess:
- For example, a user may want to configure the agent to be launched via a docker-compose configuration file. This is doable via just an “exec” string (i.e.
docker-compose up config.yml
) , but there may be good reasons to have a structured config for this. - The aforementioned injectable process executor can also potentially alleviate this, if a custom format for the
url
field is used.
- For example, a user may want to configure the agent to be launched via a docker-compose configuration file. This is doable via just an “exec” string (i.e.
- Specifying configuration to the running process as described can be cumbersome. It involves encoding all configuration in a string, rather than allowing structured
env
andargs
configuration, which is how most tooling integrates this.- Unfortunately, we don’t have a great place for putting an
env
orargs
field in the AgentCard in relation to the transport. We could augment this proposal to support transport-specific arguments. That could be useful beyond this case, such as for gRPC servers that that use custom channel configuration.
- Unfortunately, we don’t have a great place for putting an
Alternative 1: Host-owned subprocess spawning. One alternative is to move the responsibility of starting and stopping an agent subprocess to the host. The host would need to be aware of the fact that this agent is run as a local sub-process and directly construct the stdio-backed client for it, bypassing the SDK standard of AgentCard-based client construction. A possible bridge between the two is to have a convention of the agent sub-process immediately printing its AgentCard to stdout. This allows the host to retrieve an AgentCard for the subprocess, then use standard AgentCard-based client construction code from the SDK (note that providing the stdin/stdout streams to the client constructor is still an open question). Note that this makes the purpose of the url
field unclear. If desired, this option can be explored further and better defined.
Alternative 2: Process file descriptor URLs. On linux, the file descriptor for every process are available in the /proc file namespace. This means it’s technically possible to produce a file://
URL indicating the stdin and stdout file descriptors for a running process, for example file:///proc/32/fd/0
. This would allow a client to connect to the stdio streams for a running process, however there are some severe downsides: stdio streams are connectionless, so multiple clients connected to one set of streams could cause bizarre message interleaving as two clients simultaneously write to or read from a stream. Additionally, reading the file descriptors of a process requires specific privileges in Linux, meaning the host process must have been vested with those privileges a-priori.
Describe alternatives you've considered
Alternative 1: Local HTTP serving. It is possible to run an agent as a subprocess and still connect to it via HTTP. There are two concerns to address with this path:
- Picking ports. The agent must choose a port to serve on, which could be already taken. This is a complication that every agent would need to solve.
- Retrieving AgentCards. How does the parent process know what the AgentCard of the spawned process is? This could be solved by having the process print the AgentCard JSON to stdout.
Additional context
No response
Code of Conduct
- I agree to follow this project's Code of Conduct