-
Notifications
You must be signed in to change notification settings - Fork 401
MSC3245: Voice messages (using extensible events) #3245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: old_master
Are you sure you want to change the base?
Changes from 7 commits
65b34d2
23cf17d
674bc16
a19116f
6adb05e
4329e5a
0bb64cd
c2ed707
83f6c5b
76ba1ac
a7f6124
41bb8e5
36bf068
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,200 @@ | ||
# MSC3245: Voice messages (using extensible events) | ||
|
||
Voice messages are a useful way to quickly send a message to someone without having to use the more | ||
awkward keyboard. Typically short in length, voice messages can be sent as annotated audio files | ||
to recipients. | ||
Comment on lines
+3
to
+5
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This MSC doesn't motivate why voice messages should exist as a separate concept in addition to the more generic audio messages. I had to go over to MSC2516 to find some (potential) reasons. Could that be added? |
||
|
||
More information about voice messages and what they can be used for can be found on | ||
[MSC2516 - Voice messages via msgtype](https://github.com/matrix-org/matrix-doc/pull/2516). This | ||
MSC inherits a lot of the beliefs and usecases of that MSC, but instead packages the event contents | ||
a bit differently. Specifically, this makes use of | ||
[MSC1767 - Extensible Events](https://github.com/matrix-org/matrix-doc/pull/1767). | ||
|
||
## Proposal | ||
|
||
Much like MSC2516, voice messages are defined as OGG files, encoded with Opus, using relatively sane | ||
settings for voice recordings. This proposal does not define specific settings for clients to use, | ||
but does strongly recommend reducing file size without losing audio quality as much as possible. Some | ||
sample settings are: | ||
|
||
* Sample rate: 48kHz | ||
* Bitrate: 24kbps | ||
* Mono (single channel) | ||
* Appropriate complexity and resample quality for the platform. | ||
* Encoder application: 2048 (voice, default is typically 2049 as audio). This doesn't have any signficant | ||
impact on the resulting recording. | ||
|
||
We use Opus to be compatible with other messaging platforms, particularly the ones that can be bridged | ||
easily to Matrix. This proposal aims to avoid having bridges (and to a degree, clients) transcode | ||
voice messages as that would likely push voice messages further away from the "faster communication" | ||
use case. Bridges are already needing to do processing on the events and can see seconds worth of latency: | ||
an extra couple seconds to re-encode a voice message would not be helping that. | ||
|
||
No maximum duration is specified, however clients are encouraged not to send long-running recordings | ||
as they might be rejected/ignored on the receiving end for file size reasons. Typically, this should | ||
be less than 5 minutes worth of audio. | ||
|
||
MSC1767 is used to package the voice message up in an annotated event format. Readers of this proposal | ||
are encouraged to [read the MSC](https://github.com/matrix-org/matrix-doc/pull/1767) prior to commenting | ||
or parsing this MSC's use of Extensible Events. | ||
|
||
An example voice message would be: | ||
|
||
```json5 | ||
{ | ||
"type": "m.voice", | ||
"content": { | ||
"m.text": "Voice message", | ||
"m.file": { | ||
"size": 7992, | ||
"name": "Voice message.ogg", | ||
"mimetype": "audio/ogg", | ||
"url": "mxc://example.org/abcdef" | ||
}, | ||
"m.audio": { | ||
"duration": 6541, | ||
"waveform": [0, 256, /*...*/ 1024] // https://github.com/matrix-org/matrix-doc/pull/3246 | ||
}, | ||
"m.voice": {} | ||
}, | ||
// other fields required by the spec, but not important here | ||
} | ||
``` | ||
|
||
Of particular note is that the `m.voice` event definition is empty. This is because fundamentally the event | ||
is an audio event, which means looking at the `m.audio` and `m.file` event bodies for relevant information | ||
about the audio clip. The `m.voice` type is simply used to annotate the event for clients which would like | ||
to render voice messages differently to regular audio files. Clients which don't do anything special for | ||
voice messages can treat `m.voice` as effectively unknown during rendering, likely falling on the `m.audio` | ||
definition instead. | ||
|
||
This proposal suggests that the textual fallback be "Voice message" for moderately sensical push/desktop/email | ||
notifications. Note that MSC1767 supports internationalization, which clients should make use of as needed. | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Text-to-speech and similar capabilities for voice messages would be covered by a future MSC, if desired. | ||
|
||
## Potential issues | ||
|
||
The `m.voice` identifier could probably conflict, as `m.audio` could conflict as well. We may be interested in | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
discussing `m.message.voice` or similar instead, though likely at MSC1767 rather than this proposal. | ||
|
||
This also annotates events with an empty object and potentially a lot of extra information. This is considered | ||
to be an issue for MSC1767 to consider rather than this MSC, as this MSC is simply saying to create audio events | ||
with some sort of voice message type. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it seems like we need to resolve these issues in MSC1767 before we can sensibly complete this MSC. |
||
|
||
## Alternatives | ||
|
||
As mentioned, [MSC2516](https://github.com/matrix-org/matrix-doc/pull/2516) exists to make voice messages a | ||
dedicated `msgtype`, which would work in the near term. There's additionally discussion on that MSC on whether | ||
or not a flag within the event would be more appropriate than a `msgtype`. This proposal believes that both a | ||
`msgtype` and flag within the `content` would accomplish the same thing as this MSC, however that the fallback | ||
scenarios are less desirable. Flags in the `content` are subject to extensive proliferation if accepted, making | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
it harder to migrate to event formats like Extensible Events. Dedicated message types have the issue of not all | ||
clients handling unknown message types the same way. Some clients would render it as a plain file upload while | ||
many others would simply ignore the message entirely. This is not great representation for an audio message. | ||
This proposal counters both problems by using Extensible Events out of the box, which results in clients being | ||
able to render whatever they can, which will typically be either the voice message or an audio event, and fall | ||
back accordingly to the not-great representations if they need to. | ||
|
||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Security considerations | ||
|
||
Voice messages by nature are human voices being sent over the internet. This can be used for malicious purposes | ||
outside the control of Matrix: users are cautioned to not send voice messages to untrusted places, such as large | ||
public rooms or unknown individuals. | ||
|
||
As with all media events, clients should be wary that the contained file is actually an audio file. Playing JPEGs | ||
or executables over the user's speakers are unlikely to go down very well. | ||
|
||
Voice messages are likely best used in encrypted rooms due to the high liklihood that the members of the room are | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
trusted, and the user's voice is not uploaded plainly to the media repo. Typically, this will be DMs or other | ||
forms of private chats in most clients. | ||
|
||
## Transitional event format | ||
|
||
As mentioned on MSC1767, the Extensible Events format can be mixed in with the existing specification for message | ||
events. For as long as MSC1767's transitional period is in place, the following would represent a voice message: | ||
|
||
```json5 | ||
{ | ||
"type": "m.room.message", // Changed! | ||
"content": { | ||
"body": "Voice message", | ||
"msgtype": "m.audio", // To preserve semantic meaning | ||
"url": "mxc://example.org/abcdef", | ||
"info": { | ||
"duration": 6541, | ||
"mimetype": "audio/ogg", | ||
"size": 7992 | ||
}, | ||
|
||
// All of this is otherwise unchanged. | ||
"m.text": "Voice message", | ||
"m.file": { | ||
"size": 7992, | ||
"name": "Voice message.ogg", | ||
"url": "mxc://example.org/abcdef", | ||
"mimetype": "audio/ogg" | ||
}, | ||
"m.audio": { | ||
"duration": 6541, | ||
"waveform": [0, 256, /*...*/ 1024] | ||
}, | ||
|
||
// This can still be used to identify an audio message from a voice message, | ||
// even with the fallback of `msgtype: m.audio` | ||
"m.voice": {} | ||
}, | ||
// other fields required by the spec, but not important here | ||
} | ||
``` | ||
|
||
## Unstable prefix | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jonnyandrew says:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't recall which version is already deployed out in the wild, but would recommend using the one that other clients use. If it turns out that no one implements the v2 stuff (like in the case with Polls), we can fix this MSC. |
||
|
||
While this MSC is not considered stable implementations should use the following guidelines: | ||
|
||
* Use MSC1767's unstable prefix and migration strategy where possible. | ||
* Use `org.matrix.msc3245` in place of `m.voice`. | ||
* Carefully send events which match the example given below. This is to make other implementations easier to | ||
write. | ||
|
||
Example event (using all the unstable prefixing rules): | ||
|
||
```json5 | ||
{ | ||
"type": "m.room.message", | ||
"content": { | ||
"body": "Voice message", | ||
"msgtype": "m.audio", | ||
"url": "mxc://example.org/abcdef", | ||
"info": { | ||
"duration": 6541, | ||
"mimetype": "audio/ogg", | ||
"size": 7992 | ||
}, | ||
"org.matrix.msc1767.text": "Voice message", | ||
"org.matrix.msc1767.file": { | ||
"size": 7992, | ||
"name": "Voice message.ogg", | ||
"url": "mxc://example.org/abcdef", | ||
"mimetype": "audio/ogg" | ||
}, | ||
"org.matrix.msc1767.audio": { | ||
"duration": 6541, | ||
"waveform": [0, 256, /*...*/ 1024] | ||
}, | ||
|
||
// This can still be used to identify an audio message from a voice message, | ||
// even with the fallback of `msgtype: m.audio` | ||
"org.matrix.msc3245.voice": {} | ||
}, | ||
// other fields required by the spec, but not important here | ||
} | ||
``` | ||
|
||
Client implementations should note that instead of `org.matrix.msc3245.voice` there are wild events using | ||
`org.matrix.msc2516.voice` as a precursor experiment to this MSC. The MSC2516 namespace is not considered | ||
correct, though clients may wish to handle it the same. | ||
|
||
## Dependent MSCs | ||
|
||
This MSC requires [MSC1767 - Extensible Events](https://github.com/matrix-org/matrix-doc/pull/1767) in order | ||
to make the most sense in the specification. |
Uh oh!
There was an error while loading. Please reload this page.