-
Notifications
You must be signed in to change notification settings - Fork 401
MSC2516: Add a new message type for voice messages #2516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
3a7d67f
4607d6d
a85b80d
e93cf14
87ac8dd
442b900
f9ac5d2
30100fa
19d4d5c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,55 @@ | ||||||||||
# Add a seperate message type for voice messages | ||||||||||
ludwigbald marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
In the matrix spec right now, there is a message type `m.audio` for audio files. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. link to https://matrix.org/docs/spec/client_server/r0.6.1#m-audio would be helpful here. |
||||||||||
In other messaging apps, there is also a special type for voice memos, | ||||||||||
since they carry a different meaning and inflict different behaviour. | ||||||||||
This MSC calls for the introduction of an `m.voice` message type. | ||||||||||
|
||||||||||
|
||||||||||
## Proposal | ||||||||||
|
||||||||||
Even if it's not the primary mode of communication for nerds, | ||||||||||
ludwigbald marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
voice memos are very important to a lot of users of modern instant messaging services. | ||||||||||
In order to provide awesome voice messages, they need to be treated differently from generic audio files. | ||||||||||
|
||||||||||
For example, WhatsApp renders them differenty, to highlight that they are a way of communication. | ||||||||||
ludwigbald marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
WhatsApp also always force-downloads them, because like a text message, | ||||||||||
they should be available to consume as early as possible. | ||||||||||
This lets the recipient know at a glance that they are being expected | ||||||||||
to listen to the voice messages now, instead of later. | ||||||||||
ludwigbald marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
Matrix voice messages should reinforce the authenticity, originality | ||||||||||
and potential urgency of the audio content. | ||||||||||
|
||||||||||
So, I propose to introduce a new message type `m.voice` with the same | ||||||||||
contents as `m.audio`, but to be handled slightly differently. | ||||||||||
|
||||||||||
ludwigbald marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
### Related links: | ||||||||||
- [A long-standing issue on Riot Web that calls for voice messages | ||||||||||
](https://github.com/vector-im/riot-web/issues/1358) | ||||||||||
- [An earlier proposal to send m.typing-like status codes when recording | ||||||||||
](https://github.com/matrix-org/matrix-doc/pull/310) | ||||||||||
|
||||||||||
## Potential issues | ||||||||||
|
||||||||||
Introducing a new message type means that client developers will have to do work to implement it, | ||||||||||
or many people won't be able to use the feature. | ||||||||||
|
||||||||||
## Alternatives | ||||||||||
ludwigbald marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
Alternatively, a flag `voice = true` or `type = "voice"` could be created inside of the `m.audio` event. | ||||||||||
I'm not sure what the more canonical way of doing things would be here. | ||||||||||
ludwigbald marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
This alternative version (extending the m.audio message type) has the benefit | ||||||||||
that it comes with backwards compatibility for free. However, we should keep | ||||||||||
types as simple as possible. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should try and overload There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not trying to repeat the discussion we had in #spec, but instead just documenting it, so that it doesn't get lost: Well, this isn't necessarily a spec defined representation, but a content hint, that may change the client side representation. A lot of clients may opt to represent them the same way initially. We already have a info object for most media types, that describes the content more exactly by adding a filename, thumbnail, size, duration, etc. Using a new message type will regress the UX for users first, since most clients will not understand the msgtype and will default to showing the voice message as text. Some clients may not focus on implementing them in the near future (location messages, proper reply rendering, and similar events are not necessarily implemented in clients currently for example), which will make for a worse experience when sending a voice message over an audio message. Each new message type also increases the implementation burden for new clients. So basically we have, in my eyes:
I guess choosing one over the other is a trade off, I prefer the flag though, since it is easier to implement and will cause less bug reports (because one side doesn't support it). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm honestly not convinced that we should be complicating the spec with a third way of representing events. Yes, older clients will not be able to render the events properly however realistically they should be updated shortly after a spec release to handle this sort of thing, even if it is just adding I would not anticipate an overwhelming number of bug reports for lack of support given historical cases of similar things (largely within the Element clients, but occasionally from the spec too). For example, we've not seen an overwhelming number of bug reports on Matrix projects relating to SSO despite major servers offering it as the only login option. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that it's a question of whether there's a semantic difference between voice messages and audio files, or if it's just an aesthetic difference. That is, are they different things, or are they the same thing but merely rendered differently? If they are different things, then it should be a different msgtype (in the same way that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess that is a good point. They probably have more of a semantic difference to the user and some clients will render them differently, but they are mostly the same on the backend. This is similar to stickers, which are usually represented very similarly and behave the same way in the backend (apart from the image source mostly), but we have different event types for them. For consistency it makes sense to have a different msgtype then, because they are different semantically imo. It just sucks for backwards compatibility, but if we want to be forward thinking, a different msgtype is certainly cleaner. Especially considering that notices and voice messages usually also cause a change in notification behaviour. (At least I could imagine setting a different notification sound or treating them more like a one sided call.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the semantic difference is pretty weak. We also see some talk here about including metadata for music files separately. At this point it seems quite weird to have 3 different "styles" of inline-playable audio clips that happen to have all or mostly the same parameters. Even in the examples showed from other messaging services the UI differences are minor. They appear to be the same type of widget with minor tweaks. That being said, I agree it isn't major enough to block the RFC, but it does seem funny to add all of the spec and discussion for minor behaviour tweaks. |
||||||||||
|
||||||||||
## Security considerations | ||||||||||
|
||||||||||
@uhoreg offers: | ||||||||||
> Auto-downloading of files (if clients follow WhatsApp's example) sounds | ||||||||||
like it could be a security issue. (e.g. DoS by using up users' bandwidth, | ||||||||||
could cause malicious content to be automatically downloaded) | ||||||||||
Comment on lines
+55
to
+58
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is a concern.
|
||||||||||
|
||||||||||
This could be solved by having clients handle auto-download responsibly, | ||||||||||
e.g. only auto-download voice messages from trusted contacts. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be great to better understand what "trusted contacts" means here - verified devices maybe? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it would necessarily be the same as verified users, since they are "trusted" in different ways. "Trusted" here means more that you trust the other person to not behave badly (e.g. by not sending you a massive file, or a file that might trigger a vulnerability in your audio player), whereas verification indicates that you trust that the person is who they claim to be. But you could verify someone without trusting that they are not malicious (e.g. your friend who loves playing bad practical jokes, or an employee of a rival company that you're collaborating with but is still a rival). It would probably be up to the client to figure out how to ask the user who is considered "trusted". Maybe it could do something like what mail clients do with images, where they ask you if you want to load the images in a message, and allow you to select whether to always allow from that sender. |
Uh oh!
There was an error while loading. Please reload this page.