-
Notifications
You must be signed in to change notification settings - Fork 401
MSC1763: Proposal for specifying configurable message retention periods #1763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ara4n
wants to merge
37
commits into
old_master
Choose a base branch
from
matthew/msc1763
base: old_master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
687b650
first cut of MSC1763 for configurable event retention
ara4n f770440
ephemeral msging ended up in scope
ara4n b25367e
fix english
ara4n 2aafa02
clarify this only applies to non-state events; fix retention JSON str…
ara4n 64695ed
make conflict alg explicit for user retention settings
ara4n c493dbd
change max >= min invariant
ara4n 0afc3af
spell out that self-destructing msgs need explicit RRs
ara4n 7597e03
more validation on fields
ara4n 7a8d204
spell out how the example server admin overrides would work
ara4n 4646fcd
improve wording; spell out purge/redact dichotomy; add explicit alg
ara4n c55158d
clarify redaction semantic and default PL
ara4n 6e33c2f
track max's idea of advertising retention per-server
ara4n 28ea4e1
fix normatives
ara4n cca99dd
clarify client behaviour
ara4n a4974b6
make self_destruct set a timer in seconds rather than be binary.
ara4n c27394c
clarify warning about conflicts
ara4n f0553c0
Merge branch 'master' into matthew/msc1763
ara4n bdce6f1
remove per-message retention and self-destruct messages entirely to t…
ara4n a30a853
spell out that events will disappear from event streams when purged
ara4n c281420
add the 'why not nego?' tradeoff
ara4n ef215dd
clarify the intention to not default to finite message retention
ara4n 0b6a209
spell out not to default to a max_lifetime
ara4n 5c29779
incorporate review
ara4n 032e63b
Apply suggestions from code review
ara4n 1a4101e
link #2228
ara4n 90b17d6
units
ara4n 32f21ac
lifetimes in milliseconds
ara4n a1b8726
fix json number ranges
ara4n ee0a7ee
Update 1763-configurable-retention-periods.md
richvdh cabef48
Apply suggestions from code review
ara4n f5c3729
incorporate review
ara4n f8ceb97
spell out an example UI for warning about retention
ara4n 8b1a0c3
clarify care & feeding of DAG
ara4n 9357ec6
incorporate more @richvdh review
ara4n ac2f87e
Apply suggestions from code review
ara4n 116c5b9
split out media attachment clean-up to #2278
ara4n f809087
Massively rewrite the proposal
babolivier File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,243 @@ | ||
# Proposal for specifying configurable retention periods for messages. | ||
|
||
A major shortcoming of Matrix has been the inability to specify how long | ||
events should stored by the servers and clients which participate in a given | ||
room. | ||
|
||
This proposal aims to specify a simple yet flexible set of rules which allow | ||
users, room admins and server admins to determine how long data should be | ||
stored for a room, from the perspective of respecting the privacy requirements | ||
of that room (which may range from "burn after reading" ephemeral messages, | ||
through to FOIA-style public record keeping requirements). | ||
|
||
As well as enforcing privacy requirements, these rules provide a way for server | ||
administrators to better manage disk space (e.g. to enforce rules such as "don't | ||
store remote events for public rooms for more than a month"). | ||
|
||
## Problem: | ||
|
||
Matrix is inherently a protocol for storing and synchronising conversation | ||
history, and various parties may wish to control how long that history is stored | ||
for. | ||
|
||
* Users may wish to specify a maximum age for their messages for privacy | ||
purposes, for instance: | ||
* to avoid their messages (or message metadata) being profiled by | ||
unscrupulous or compromised homeservers | ||
* to avoid their messages in public rooms staying indefinitely on the public | ||
record | ||
* because of legal/corporate requirements to store message history for a | ||
limited period of time | ||
* because of legal/corporate requirements to store messages forever | ||
(e.g. FOIA) | ||
* to provide "ephemeral messaging" semantics where messages are best-effort | ||
deleted after being read. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I question the feasibility of this - on what I essentially see as a matrix-specced version of Synapse's History Purge functionality. What would qualify exactly as "after read"? Shouldn't this be removed and left alone for MSC2228 to specify or address? |
||
* Room admins may wish to specify a retention policy for all messages in a | ||
room. | ||
* A room admin may wish to enforce a lower or upper bound on message | ||
retention on behalf of its users, overriding their preferences. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* A bridged room should be able to enforce the data retention policies of the | ||
remote rooms. | ||
* Server admins may wish to specify a retention policy for their copy of given | ||
rooms, in order to manage disk space. | ||
|
||
Additionally, we would like to provide this behaviour whilst also ensuring that | ||
users generally see a consistent view of message history, without lots of gaps | ||
and one-sided conversations where messages have been automatically removed. | ||
|
||
At the least, it should be possible for people participating in a conversation | ||
to know the expected lifetime of the other messages in the conversation **at the | ||
time they are sent** in order to know how best to interact with them (i.e. | ||
whether they are knowingly participating in a future one-sided conversation or | ||
not). | ||
|
||
We would also like to discourage users from setting low message retention as a | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
matter of course, as it can result in very antisocial conversation patterns to | ||
the detriment of Matrix as a useful communication mechanism. | ||
|
||
This proposal does not try to solve the problems of: | ||
* GDPR erasure (as this involves retrospectively changing the lifetime of | ||
messages) | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Bulk redaction (e.g. to remove all messages from an abusive user in a room, | ||
as again this is retrospectively changing message lifetime) | ||
* Limiting the number (rather than age) of messages stored per room (as this is | ||
more a question of quotaing rather than empowering privacy) | ||
* Ephemeral messaging? | ||
|
||
## Proposal | ||
|
||
### User-specified per-message retention | ||
|
||
Users can specify per-message retention by adding the following fields to the | ||
event alongside its content: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
`max_lifetime`: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the maximum duration in seconds for which a well-behaved server should store | ||
this event. If absent, or null, it should be interpreted as 'forever'. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
`min_lifetime`: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the minimum duration for which a well-behaved server should store this event. | ||
If absent, or null, should be interpreted as 'forever' | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
`self_destruct`: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a boolean for whether wellbehaved servers should remove this event after | ||
seeing an explicit read receipt delivered for it. | ||
|
||
`expire_on_clients`: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a boolean for whether well-behaved clients should expire messages clientside | ||
to match the min/max lifetime and/or self_destruct semantics fields. | ||
|
||
For instance: | ||
|
||
```json | ||
{ | ||
"type": "m.room.message", | ||
"max_lifetime": 86400, | ||
"content": ... | ||
} | ||
``` | ||
|
||
The above example means that servers receiving this message should store the | ||
event for a only 86400 seconds (1 day), as measured from that event's | ||
origin_server_ts, after which they MUST prune the event from their | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
DBs. We consciously do not redact the event, as we are trying to eliminate | ||
metadata here at the cost of deliberately fracturing the DAG (which will | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
fragment into disparate chunks). | ||
|
||
```json | ||
{ | ||
"type": "m.room.message", | ||
"min_lifetime": 2419200, | ||
"content": ... | ||
} | ||
``` | ||
|
||
The above example means that servers receiving this message SHOULD store the | ||
event forever, but MAY choose to prune their copy after 28 days (or longer) in | ||
order to reclaim diskspace. | ||
|
||
```json | ||
{ | ||
"type": "m.room.message", | ||
"self_destruct": true, | ||
"expire_on_clients": true, | ||
"content": ... | ||
} | ||
``` | ||
|
||
The above example describes 'self-destructing message' semantics where both server | ||
and clients MUST prune/delete the event and associated data as soon as a read | ||
receipt is received from the recipient. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### User-advertised per-message retention | ||
|
||
If we had extensible profiles, users could advertise their intended per-message | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
retention in their profile (in global profile or per-room profile) as a useful | ||
social cue. However, this would be purely informational. | ||
|
||
### Room Admin-specified per-room retention | ||
|
||
We introduce a `m.room.retention` state event, which room admins can set to | ||
override the retention behaviour for a given room. This takes the same fields | ||
described above. | ||
|
||
If set, these fields directly override any per-message retention behaviour | ||
specified by the user - even if it means forcing laxer privacy requirements on | ||
that user. This is a conscious privacy tradeoff to allow admins to specify | ||
explicit privacy requirements for a room. For instance, a room may explicitly | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
disable self-destructing messages by setting `self_destruct: false`, or may | ||
require all messages in the room be stored forever with `min_lifetime: null`. | ||
|
||
In the instance of `min_lifetime` or `max_lifetime` being overridden, the | ||
invariant that `max_lifetime > min_lifetime` must be maintained by clamping | ||
max_lifetime to be greater than `min_lifetime`. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If the user's retention settings conflicts with those in the room, then the user's | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
clients should warn the user. | ||
|
||
### Server Admin-specified per-room retention | ||
|
||
Server admins have two ways of influencing message retention on their server: | ||
|
||
1) Specifying a default `m.room.retention` for rooms created on the server, as | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
defined as a per-server implementation configuration option which inserts the | ||
state events after creating the room (effectively augmenting the presets used | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
when creating a room). If a server admin is trying to conserve diskspace, they | ||
may do so by specifying and enforcing a relatively low min_lifetime (e.g. 1 | ||
month), but not specify a max_lifetime, in the hope that other servers will | ||
retain the data for longer. | ||
|
||
XXX: is this the correct approach to take? It's how we force E2E encryption on, | ||
but it feels very fragmentory and magical presets to do different things depending | ||
on which server you're on. | ||
|
||
2) By adjusting how aggressively their server enforces the the `min_lifetime` | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
value for message retention. For instance, a server admin could configure their | ||
server to attempt to automatically remote purge messages in public rooms which | ||
are older than three months (unless min_lifetime for those messages was set | ||
higher). | ||
|
||
The expected configuration here could be something like: | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* target_lifetime_public_remote_events: 3 months | ||
* target_lifetime_public_local_events: null # forever | ||
* target_lifetime_private_remote_events: null # forever | ||
* target_lifetime_private_local_events: null # forever | ||
|
||
...which would try to automatically purge remote events from public rooms after | ||
3 months (assuming their individual min_lifetime is not higher), but leave | ||
others alone. | ||
|
||
XXX: should this configuration be specced or left as an implementation-specific | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
config option? | ||
|
||
Server admins could also override the requested retention limits (e.g. if resource | ||
constrained), but this isn't recommended given it may result in history being | ||
irrevocably lost against the senders' wishes. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Client-side behaviour | ||
|
||
Clients should independently calculate the retention of a message based on the | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
event fields and the room state, and show the message lifespan in the UI. If a | ||
message has a finite lifespan that fact MUST be indicated clearly in the timeline | ||
to allow users to correctly interact with the message. (The details of the | ||
lifespan can be shown on demand, however). | ||
|
||
If `expire_on_clients` is true, then clients should also calculate expiration for | ||
said events and delete them from their local stores as required. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Tradeoffs | ||
|
||
This proposal deliberately doesn't address GDPR erasure or mega-redaction scenarios, | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
as it attempts to build a coherent UX around the use case of users knowing their | ||
privacy requirements *at the point they send messages*. Meanwhile GDPR erasure is | ||
handled elsewhere (and involves hiding rather than purging messages, in order to | ||
avoid annhilating conversation history), and mega-redaction is yet to be defined. | ||
|
||
## Potential issues | ||
|
||
How do we handle scenarios where users try to re-backfill in history which has | ||
already been purged? This should presumably be a server admin option on whether | ||
to allow it or not, and if allowed, configure how long the backfill should persist | ||
for before being purged again? | ||
|
||
## Security considerations | ||
|
||
There's scope for abuse where users can send abusive messages into a room with a | ||
short max_lifetime and/or self_destruct set true which promptly self-destruct. | ||
|
||
One solution for this could be for server implementations to implement a quarantine | ||
mode which initially marks purged events as quarantined for N days before deleting | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
them entirely, allowing server admins to address abuse concerns. | ||
|
||
## Conclusion | ||
|
||
Previous attempts to solve this have got stuck by trying to combine together too many | ||
disparate problems (e.g. reclaiming diskspace; aiding user data privacy; self-destructing | ||
messages; mega-redaction; clearing history on specific devices; etc) - see | ||
https://github.com/matrix-org/matrix-doc/issues/440 and https://github.com/matrix-org/matrix-doc/issues/447 | ||
for the history. | ||
|
||
This proposal attempts to simplify things to strictly considering the question of | ||
how long servers should persist events for (with the extension of self-destructing | ||
messages added more to validate that the design is able to support such a feature). |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.