Skip to content

MSC4295: Bot bounce limit - a better loop prevention mechanism #4295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

m13253
Copy link

@m13253 m13253 commented May 31, 2025

Rendered

(About me: I develop E2EE-capable Matrix bots and bridges tailored for two communities. Recently, I open-sourced my matrixbot-ezlogin Rust library to help people build Matrix bots without worrying about the authentication and E2EE bootstrap process.)

@m13253 m13253 force-pushed the bot-bounce-limit branch from 626f721 to 01b72b8 Compare May 31, 2025 03:31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Sending bot
  • Receiving bot (the one that would loop)

@turt2live turt2live added proposal A matrix spec change proposal client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels May 31, 2025
3. For a room purposed for technical support, the operator can run an AI-powered bot to automatically answer common questions. Such AI bot is allowed to trigger other bots for certain helpful tasks.
4. The room operator can run a "UTD notification bot" that notifies room members that their messages can't be decrypted by others. However, it is very important to prevent it from replying another bot's message.
5. When bridging rooms across three or more platforms (e.g., Matrix ⇌ Telegram ⇌ IRC ⇌ Matrix), it is necessary to make sure each bridge doesn't pick up another bridge's messages.
6. Bridges supporting double-puppeting needs to ignore messages sent by a reverse puppet. Although they already employ proprietary methods (e.g., vendor-prefixed tags like `fi.mau.double_puppet_source` or a list of ignored user IDs), it could be very useful to provide a standardized loop-preventing mechanism, allowing bridges from different vendors to work in harmony at the same room.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal doesn't appear to solve the problem that fi.mau.double_puppet_source was made for. Specifically, when a bridge sends a message from a double puppet (not a bridge ghost user), it must have some flag to prevent echoing back the message to the remote network where it came from. The flag is not meant to stop any other bridge or bot from reacting to the message, it's only meant to be detected by the origin bridge.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal doesn't appear to solve the problem that fi.mau.double_puppet_source was made for. Specifically, when a bridge sends a message from a double puppet (not a bridge ghost user), it must have some flag to prevent echoing back the message to the remote network where it came from. The flag is not meant to stop any other bridge or bot from reacting to the message, it's only meant to be detected by the origin bridge.

It’s an honor to get a feedback from Mautrix’s side!

After understanding your explanations, I admit this proposal doesn’t solve fi.mau.double_puppet_source’s problem. In fact, this proposal seems to solve a completely different problem orthogonal to fi.mau.double_puppet_source’s problem.

  1. Although both mechanisms are designed to prevent infinite loop, fi.mau.double_puppet_source prevents looping between two platforms, m.bounce_limit prevents looping within Matrix.
  2. fi.mau.double_puppet_source doesn’t prevent the message being interacted by other bots or bridges, while m.bounce_limit sets a bounce limit to do so.
  3. On the other hand, fi.mau.double_puppet_source doesn't deal with the situation where a room has two independent bridge instances -- e.g., one relaybot maintained by the room operator, and one double-puppet maintained by a room member on a separate homeserver --, while m.bounce_limit tries to solve this problem.

(Please correct me if my understanding is still inaccurate.)

Probably I will need to rephrase or remove this sentence. I think m.bounce_limit won’t replace fi.mau.double_puppet_source. They will co-exist, because two mechanisms solve two different problems.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I replaced this example in the Background section with another example.


However, there are a few disadvantages of `m.notice`:

1. It is analogous to `m.text`, which doesn't support attached files or encrypted images.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensible events already have a solution for this #3955

Copy link
Author

@m13253 m13253 May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensible events already have a solution for this #3955

Looks interesting. The difference is MSC3955 uses boolean to mark automated messages, while this MSC4295 uses integer.

The advantage of integer TTL is it allows multiple bots to work together. — which is the biggest motivation of this proposal.

Do you think combining both ideas together is a way to go? (Extensible Events + integer TTL)

(Informally I’ll call it TTL, as networking people may be more familiar with this term. Formally it should be called “Bounce Limit.”)

Copy link
Author

@m13253 m13253 May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I included MSC3955 in the Existing solutions section, parallel to the m.notice subsection.


1. It is analogous to `m.text`, which doesn't support attached files or encrypted images.
2. It is designed for automated messages, not bridged messages sent originally by a human.
3. Similarly, `m.notice` won't be picked up by bridges to forward to a bridged platform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bridges can and do pick up m.notice if configured to do so

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bridges can and do pick up m.notice if configured to do so

Thanks for your confirmation!

Indeed, I just checked the Matrix API Spec: The spec doesn’t say whether bridges should or shouldn’t pick up m.notice.

I was careless. I will rephrase this sentence.

m13253 added 5 commits May 31, 2025 11:25
Signed-off-by: Star Brilliant <coder@poorlab.com>
Signed-off-by: Star Brilliant <coder@poorlab.com>
Signed-off-by: Star Brilliant <coder@poorlab.com>
Signed-off-by: Star Brilliant <coder@poorlab.com>
Signed-off-by: Star Brilliant <coder@poorlab.com>

These are invalid forms, and their normalization rules upon receiving:

1. The number 0, which should be treated as missing. (This design is to simplify the development of bots in certain programming languages, such as Go.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make semantic sense. Logically a value of 0 would mean "do not forward".

Copy link
Author

@m13253 m13253 Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make semantic sense. Logically a value of 0 would mean "do not forward".

Thanks for your comment.

Here are two explanations of this decision:

  1. This is an analogy of Hop Limit in IP networks.

    An IP packet with Hop Limit of 1 means the packet is able to transmit to the recipient, but if the recipient is a router, it isn’t allowed to forward the packet to anywhere else.
    An IP packet with Hop Limit of 0, if I remember correctly, is invalid.
    Therefore, a developer who has experience with IP networking might be able to feel the current design more familiar to them, than making 0 a valid value.

  2. If we make 0 an invalid value, some programming languages need some fewer steps to distinguish 0 and a missing value.

    One example of such programming languages, is Go.
    To distinguish a 0 value and a missing value, the Go struct needs to be written as:

    type OriginalRoomMessageEventContent struct {
        Body        string `json:"body"`
        BounceLimit *int64 `json:"m.bounce_limit"` // *int64 instead of int64
        ...
    }

    which uses one layer of pointer to distinguish 0 and missing, meaning slower performance (although negligible), more memory fragments, and more work on the developer side to get the logic right.
    C++ may be similar — depending on which JSON library you use.
    In other programming languages that supports nullable data types, such as Rust’s Option, C#’s Nullable, or TypeScript’s T | undefined, at least one more check is required to distinguish the missing value and to extract the valid numbers out.

Therefore, making 0 an invalid value is just simpler, faster to develop and run, and more similar to other existing network protocols.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I have another question: Regarding the existing Matrix protocol, if any field is invalid, how should an implementation treat it?

Should an implementation treat it as 0, missing, or reject the message at all? Perhaps this new proposal needs to be consistent in this perspective…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants