-
Notifications
You must be signed in to change notification settings - Fork 401
MSC1763: Proposal for specifying configurable message retention periods #1763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ara4n
wants to merge
37
commits into
old_master
Choose a base branch
from
matthew/msc1763
base: old_master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 13 commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
687b650
first cut of MSC1763 for configurable event retention
ara4n f770440
ephemeral msging ended up in scope
ara4n b25367e
fix english
ara4n 2aafa02
clarify this only applies to non-state events; fix retention JSON str…
ara4n 64695ed
make conflict alg explicit for user retention settings
ara4n c493dbd
change max >= min invariant
ara4n 0afc3af
spell out that self-destructing msgs need explicit RRs
ara4n 7597e03
more validation on fields
ara4n 7a8d204
spell out how the example server admin overrides would work
ara4n 4646fcd
improve wording; spell out purge/redact dichotomy; add explicit alg
ara4n c55158d
clarify redaction semantic and default PL
ara4n 6e33c2f
track max's idea of advertising retention per-server
ara4n 28ea4e1
fix normatives
ara4n cca99dd
clarify client behaviour
ara4n a4974b6
make self_destruct set a timer in seconds rather than be binary.
ara4n c27394c
clarify warning about conflicts
ara4n f0553c0
Merge branch 'master' into matthew/msc1763
ara4n bdce6f1
remove per-message retention and self-destruct messages entirely to t…
ara4n a30a853
spell out that events will disappear from event streams when purged
ara4n c281420
add the 'why not nego?' tradeoff
ara4n ef215dd
clarify the intention to not default to finite message retention
ara4n 0b6a209
spell out not to default to a max_lifetime
ara4n 5c29779
incorporate review
ara4n 032e63b
Apply suggestions from code review
ara4n 1a4101e
link #2228
ara4n 90b17d6
units
ara4n 32f21ac
lifetimes in milliseconds
ara4n a1b8726
fix json number ranges
ara4n ee0a7ee
Update 1763-configurable-retention-periods.md
richvdh cabef48
Apply suggestions from code review
ara4n f5c3729
incorporate review
ara4n f8ceb97
spell out an example UI for warning about retention
ara4n 8b1a0c3
clarify care & feeding of DAG
ara4n 9357ec6
incorporate more @richvdh review
ara4n ac2f87e
Apply suggestions from code review
ara4n 116c5b9
split out media attachment clean-up to #2278
ara4n f809087
Massively rewrite the proposal
babolivier File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,326 @@ | ||
# Proposal for specifying configurable retention periods for messages. | ||
|
||
A major shortcoming of Matrix has been the inability to specify how long | ||
events should stored by the servers and clients which participate in a given | ||
room. | ||
|
||
This proposal aims to specify a simple yet flexible set of rules which allow | ||
users, room admins and server admins to determine how long data should be | ||
stored for a room, from the perspective of respecting the privacy requirements | ||
of that room (which may range from "burn after reading" ephemeral messages, | ||
through to FOIA-style public record keeping requirements). | ||
|
||
As well as enforcing privacy requirements, these rules provide a way for server | ||
administrators to better manage disk space (e.g. to enforce rules such as "don't | ||
store remote events for public rooms for more than a month"). | ||
|
||
## Problem: | ||
|
||
Matrix is inherently a protocol for storing and synchronising conversation | ||
history, and various parties may wish to control how long that history is stored | ||
for. | ||
|
||
* Users may wish to specify a maximum age for their messages for privacy | ||
purposes, for instance: | ||
* to avoid their messages (or message metadata) being profiled by | ||
unscrupulous or compromised homeservers | ||
* to avoid their messages in public rooms staying indefinitely on the public | ||
record | ||
* because of legal/corporate requirements to store message history for a | ||
limited period of time | ||
* because of legal/corporate requirements to store messages forever | ||
(e.g. FOIA) | ||
* to provide "ephemeral messaging" semantics where messages are best-effort | ||
deleted after being read. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I question the feasibility of this - on what I essentially see as a matrix-specced version of Synapse's History Purge functionality. What would qualify exactly as "after read"? Shouldn't this be removed and left alone for MSC2228 to specify or address? |
||
* Room admins may wish to specify a retention policy for all messages in a | ||
room. | ||
* A room admin may wish to enforce a lower or upper bound on message | ||
retention on behalf of its users, overriding their preferences. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* A bridged room should be able to enforce the data retention policies of the | ||
remote rooms. | ||
* Server admins may wish to specify a retention policy for their copy of given | ||
rooms, in order to manage disk space. | ||
|
||
Additionally, we would like to provide this behaviour whilst also ensuring that | ||
users generally see a consistent view of message history, without lots of gaps | ||
and one-sided conversations where messages have been automatically removed. | ||
|
||
At the least, it should be possible for people participating in a conversation | ||
to know the expected lifetime of the other messages in the conversation **at the | ||
time they are sent** in order to know how best to interact with them (i.e. | ||
whether they are knowingly participating in a future one-sided conversation or | ||
not). | ||
|
||
We would also like to discourage users from setting low message retention as a | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
matter of course, as it can result in very antisocial conversation patterns to | ||
the detriment of Matrix as a useful communication mechanism. | ||
|
||
This proposal does not try to solve the problems of: | ||
* GDPR erasure (as this involves retrospectively changing the lifetime of | ||
messages) | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Bulk redaction (e.g. to remove all messages from an abusive user in a room, | ||
as again this is retrospectively changing message lifetime) | ||
* Limiting the number (rather than age) of messages stored per room (as this is | ||
more a question of quotaing rather than empowering privacy) | ||
|
||
## Proposal | ||
|
||
### User-specified per-message retention | ||
|
||
Users can specify per-message retention by adding the following fields to the | ||
event within its content. Retention is only considered for non-state events. | ||
|
||
`max_lifetime`: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the maximum duration in seconds for which a server must store | ||
this event. Must be null or in range [0, 2<sup>31</sup>-1]. If absent, or null, | ||
should be interpreted as 'forever'. | ||
|
||
`min_lifetime`: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the minimum duration for which a server should store this event. | ||
Must be null or in range [0, 2<sup>31</sup>-1]. If absent, or null, should be | ||
interpreted as 'forever'. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
`self_destruct`: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a boolean for whether servers must remove this event after | ||
seeing an explicit read receipt delivered for it. If absent, or null, should | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
be interpreted as false. | ||
|
||
`expire_on_clients`: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a boolean for whether clients must expire messages clientside | ||
to match the min/max lifetime and/or self_destruct semantics fields. If absent, | ||
or null, should be interpreted as false. | ||
|
||
For instance: | ||
|
||
```json | ||
{ | ||
"max_lifetime": 86400, | ||
} | ||
``` | ||
|
||
The above example means that servers receiving this message should store the | ||
event for a only 86400 seconds (1 day), as measured from that event's | ||
origin_server_ts, after which they MUST purge all references to that event ID | ||
(e.g. from their db and any in-memory queues). | ||
|
||
We consciously do not redact the event, as we are trying to eliminate | ||
metadata here at the cost of deliberately fracturing the DAG, which will | ||
fragment into disparate chunks. (See "Issues" below in terms of whether this | ||
is actually valid) | ||
|
||
```json | ||
{ | ||
"min_lifetime": 2419200, | ||
} | ||
``` | ||
|
||
The above example means that servers receiving this message SHOULD store the | ||
event forever, but MAY choose to purge their copy after 28 days (or longer) in | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
order to reclaim diskspace. | ||
|
||
```json | ||
{ | ||
"self_destruct": true, | ||
"expire_on_clients": true, | ||
} | ||
``` | ||
|
||
The above example describes 'self-destructing message' semantics where both server | ||
and clients MUST purge/delete the event and associated data as soon as a read | ||
receipt for that message is received from the recipient. | ||
|
||
Clients and servers MUST send explicit read receipts per-message for | ||
self-destructing messages (rather than for the most recently read message, | ||
as is the normal operation), so that messages can be destructed as requested. | ||
|
||
These retention fields are preserved during redaction, so that even if the event | ||
is redacted, the original copy can be subsequently purged appropriately from the | ||
DB. | ||
|
||
XXX: This may change if we end up redacting rather than purging events (see | ||
Issues below) | ||
|
||
TODO: do we want to pass these in as querystring params when sending, instead of | ||
putting them inside event.content? | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### User-advertised per-message retention | ||
|
||
If we had extensible profiles, users could advertise their intended per-message | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
retention in their profile (in global profile or per-room profile) as a useful | ||
social cue. However, this would be purely informational. | ||
|
||
### Room Admin-specified per-room retention | ||
|
||
We introduce a `m.room.retention` state event, which room admins can set to | ||
override the retention behaviour for a given room. This takes the same fields | ||
described above. It follows the default PL semantics for a state event (requiring | ||
PL of 50 by default to be set) | ||
|
||
If set, these fields replace any per-message retention behaviour | ||
specified by the user - even if it means forcing laxer privacy requirements on | ||
that user. This is a conscious privacy tradeoff to allow admins to specify | ||
explicit privacy requirements for a room. For instance, a room may explicitly | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
disable self-destructing messages by setting `self_destruct: false`, or may | ||
require all messages in the room be stored forever with `min_lifetime: null`. | ||
|
||
In the instance of `min_lifetime` or `max_lifetime` being overridden, the | ||
invariant that `max_lifetime >= min_lifetime` must be maintained by clamping | ||
max_lifetime to be equal to `min_lifetime`. | ||
|
||
If the user's retention settings conflicts with those in the room, then the | ||
user's clients should warn the user when participating in the room. A conflict | ||
exists if the user sets retention fields on their messages which are specified | ||
with differing values on the `m.room.retention` state event. | ||
|
||
### Server Admin-specified per-room retention | ||
|
||
Server admins have two ways of influencing message retention on their server: | ||
|
||
1) Specifying a default `m.room.retention` for rooms created on the server, as | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
defined as a per-server implementation configuration option which inserts the | ||
state events after creating the room (effectively augmenting the presets used | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
when creating a room). If a server admin is trying to conserve diskspace, they | ||
may do so by specifying and enforcing a relatively low min_lifetime (e.g. 1 | ||
month), but not specify a max_lifetime, in the hope that other servers will | ||
retain the data for longer. | ||
|
||
XXX: is this the correct approach to take? It's how we force E2E encryption on, | ||
but it feels very fragmentory to have magical presets which do different things | ||
depending on which server you're on. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
2) By adjusting how aggressively their server enforces the the `min_lifetime` | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
value for message retention. For instance, a server admin could configure their | ||
server to attempt to automatically remote purge messages in public rooms which | ||
are older than three months (unless min_lifetime for those messages was set | ||
higher). | ||
|
||
A possible configuration here could be something like: | ||
* target_lifetime_public_remote_events: 3 months | ||
* target_lifetime_public_local_events: null # forever | ||
* target_lifetime_private_remote_events: null # forever | ||
* target_lifetime_private_local_events: null # forever | ||
|
||
...which would try to automatically purge remote events from public rooms after | ||
3 months (assuming their individual min_lifetime is not higher), but leave | ||
others alone. | ||
|
||
These config values would interact with the min_lifetime and max_lifetime values | ||
of a message (either per-message or per-room) in the different classes of room | ||
by decreasing the effective max_lifetime to the proposed value (whilst | ||
preserving the `max_lifetime >= min_lifetime` invariant). However, the precise | ||
behaviour would be up to the server implementer. | ||
|
||
XXX: should this configuration be specced or left as an implementation-specific | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
config option? | ||
|
||
Server admins could also override the requested retention limits (e.g. if resource | ||
constrained), but this isn't recommended given it may result in history being | ||
irrevocably lost against the senders' wishes. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Client-side behaviour | ||
|
||
Clients should independently calculate the retention of a message based on the | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
event fields and the room state, and show the message lifespan in the UI. If a | ||
message has a finite lifespan that fact MUST be indicated clearly in the timeline | ||
to allow users to correctly interact with the message. (The details of the | ||
lifespan can be shown on demand, however). | ||
|
||
If `expire_on_clients` is true, then clients should also calculate expiration for | ||
said events and delete them from their local stores as required. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Pruning algorithm | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
To summarise, servers and clients must implement the pruning algorithm as | ||
follows: | ||
|
||
If we're a client, apply the algorithm if: | ||
* if specified, the `expire_on_clients` field in the `m.room.retention` event for the room is true. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* otherwise, if specified, the message's `expire_on_clients` field is true. | ||
* otherwise, don't apply the algorithm. | ||
|
||
The maximum lifetime of an event is calculated as: | ||
* if specified, the `max_lifetime` field in the `m.room.retention` event for the room. | ||
* otherwise, if specified, the message's `max_lifetime` field. | ||
* otherwise, the message's maximum lifetime is considered 'forever'. | ||
|
||
The minimum lifetime of an event is calculated as: | ||
* if specified, the `min_lifetime` field in the `m.room.retention` event for the room. | ||
* otherwise, if specified, the message's `min_lifetime` field. | ||
* otherwise, the message's minimum lifetime is considered 'forever'. | ||
* for clients, `min_lifetime` should be considered to be 0 (as there is no | ||
requirement for clients to persist events). | ||
|
||
If the calculated max_lifetime is less than the min_lifetime then the max_lifetime | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
is set to be equal to the min_lifetime. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The server/client then selects a lifetime of the event to lie between the | ||
calculated values of minimum and maximum lifetime, based on their implementation | ||
and configuration requirements. The selected lifetime MUST not exceed the | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
calculated maximum lifetime. The selected lifetime SHOULD not be less than the | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
calculated minimum lifetime, but may be less in case of constrained resources, | ||
in which case the server should prioritise retaining locally generated events | ||
over remote generated events. | ||
|
||
Server/clients then set a maintenance task to remove ("purge") the event and | ||
references to its event ID from their DB and in-memory queues after the lifetime | ||
has expired (starting timing from the absolute origin_server_ts on the event). | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
As a special case, servers and clients should immediately purge the event, on observing | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a read receipt for that specific event ID, if: | ||
* if specified, the `self_destruct` field in the `m.room.retention` event for the room is true. | ||
* otherwise, if specified, the message's `self_destruct` field is true. | ||
|
||
If possible, servers/clients should remove downstream notifications of a message | ||
once it has expired (e.g. by cancelling push notifications). | ||
|
||
## Tradeoffs | ||
|
||
This proposal deliberately doesn't address GDPR erasure or mega-redaction scenarios, | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
as it attempts to build a coherent UX around the use case of users knowing their | ||
privacy requirements *at the point they send messages*. Meanwhile GDPR erasure is | ||
handled elsewhere (and involves hiding rather than purging messages, in order to | ||
avoid annhilating conversation history), and mega-redaction is yet to be defined. | ||
|
||
## Issues | ||
|
||
It's debatable as to whether we're better off applying the redaction algorithm | ||
to expired events (and thus keep the integrity of the DAG intact, at the expense | ||
of leaking metadata), or whether to purge instead (as per the current proposal), | ||
which will punch holes in the DAG and potentially break the ability to backpaginate | ||
the room. | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
How do we handle scenarios where users try to re-backfill in history which has | ||
already been purged? This should presumably be a server admin option on whether | ||
to allow it or not, and if allowed, configure how long the backfill should persist | ||
for before being purged again? | ||
|
||
How do we handle retention of media uploads (especially for E2E rooms)? It feels | ||
the upload itself might warrant retention values applied to it. | ||
|
||
Should room retention be announced in a room per-server? The advantage is full | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
flexibility in terms of servers announcing their different policies for a room | ||
(and possibly letting users know how likely history is to be retained, or conversely | ||
letting servers know if they need to step up to retain history). The disadvantage | ||
is that it could make for very complex UX for end-users: "Warning, some servers in | ||
this room have overridden history retention to conflict with your preferences" etc. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Security considerations | ||
|
||
There's scope for abuse where users can send abusive messages into a room with a | ||
short max_lifetime and/or self_destruct set true which promptly self-destruct. | ||
|
||
One solution for this could be for server implementations to implement a quarantine | ||
mode which initially marks purged events as quarantined for N days before deleting | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
them entirely, allowing server admins to address abuse concerns. | ||
|
||
## Conclusion | ||
|
||
Previous attempts to solve this have got stuck by trying to combine together too many | ||
disparate problems (e.g. reclaiming diskspace; aiding user data privacy; self-destructing | ||
messages; mega-redaction; clearing history on specific devices; etc) - see | ||
https://github.com/matrix-org/matrix-doc/issues/440 and https://github.com/matrix-org/matrix-doc/issues/447 | ||
for the history. | ||
|
||
This proposal attempts to simplify things to strictly considering the question of | ||
how long servers should persist events for (with the extension of self-destructing | ||
messages added more to validate that the design is able to support such a feature). |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.