-
Notifications
You must be signed in to change notification settings - Fork 401
MSC2278: Deleting attachments for expired and redacted messages #2278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: old_master
Are you sure you want to change the base?
Changes from all commits
b0261a8
e161cb2
f8b6b7c
24068a7
0e8216f
4261b7d
a32bce0
c3142c9
9a409e7
a320b8e
6185341
d0b58f2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
# Proposal for deleting content for expired and redacted messages | ||
|
||
## Overview | ||
|
||
[MSC1763](https://https://github.com/matrix-org/matrix-doc/pull/1763) proposes | ||
the `m.room.retention` state event for defining how aggressively servers | ||
should purge old messages for a given room. | ||
|
||
It originally also specified how media for purged events should be purged from | ||
disk, however this was split out into a new MSC [by | ||
request](https://github.com/matrix-org/matrix-doc/pull/1763#discussion_r320289119) | ||
during review. This proposal also solves | ||
https://github.com/vector-im/riot-meta/issues/168 - the ability to garbage | ||
collect attachments from redacted events. | ||
|
||
## Proposal | ||
|
||
We handle encrypted & unencrypted rooms differently. Both require an API to | ||
delete content from the local media repo (bug | ||
[#790](https://github.com/matrix-org/matrix-doc/issues/790)), for which we | ||
propose: | ||
|
||
``` | ||
DELETE /_matrix/media/r0/download/{serverName}/{mediaId} | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
The API would return: | ||
* `200 OK {}` on success | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* `403` with error `M_FORBIDDEN` if invalid access_token or not authorised to delete. | ||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* `404` with error `M_NOT_FOUND` if the content described in the URL does not exist on the local server. | ||
|
||
The user must be authenticated via access_token or Authorization header as the | ||
original uploader, or server admin (as determined by the server implementation). | ||
|
||
Servers may wish to quarantine the deleted content for some timeframe before | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should probably spec the quarantine API (in a different MSC, in the future, eventually) |
||
actually purging it from storage, in order to mitigate abuse. | ||
|
||
If `serverName` is not the local server, the local cache (if any) of the content | ||
should be deleted. This proposal makes no effort to delete the remote content. | ||
|
||
Overlapping or near-overlapping authorised requests to `DELETE` for existing | ||
content may either return 200 or 404 based on implementation choice. | ||
|
||
*XXX: We might want to provide an undelete API too to let users rescue | ||
their content that they accidentally deleted, as you would get on a | ||
typical desktop OS file manager. Perhaps `DELETE` with `?undo=true`?* | ||
|
||
*XXX: We might also want to let admins quarantine rather than delete attachments | ||
without a timelimit by passing `?quarantine=true` or similar.* | ||
|
||
Server admins may choose to mark some content as undeletable in their | ||
implementation (e.g. for sticker packs and other content which should never be | ||
deleted or quarantined.) | ||
|
||
### Encrypted rooms | ||
|
||
There is no way for server to know what events refer to which MXC URL, so we | ||
leave it up to the client to DELETE any MXC URLs referred to by an event after | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit confused by this: assuming that there is more than one client in a given room, which has responsibility for making the DELETE request? I guess it has to be a client belonging to the original uploader, but what if they go away/stop watching the room/etc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would have to be the original uploader, but indeed, that doesn't help if someone else redacts their event for them and they don't come back and finish it off by deleting the media. |
||
it expires or redacts its local copy of an event. | ||
|
||
We rely on the fact that MXC URLs should not be reused between encrypted | ||
events, as we expect each event to have different message keys to avoid | ||
correlation. As a result, it should be safe to assume each attachment has | ||
only one referring event, and so when a client deems that the event should | ||
be deleted, it is safe to also delete the attachment without breaking any | ||
other events. | ||
|
||
It seems reasonable to consider the special case of clients forwarding | ||
encrypted attachments between rooms as a 'copy by reference' - if the | ||
original event gets deleted, the copies should too. If this isn't desired, | ||
then the attachment should have been reencrypted and stored as a separate | ||
instance in the media repo. | ||
Comment on lines
+68
to
+72
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems a bit unfortunate, but I can't say whether or not that's the right thing to do. I quite often forward encrypted media between rooms, I also think sticker packs (and similar) could run into this instance. Re-uploading media does seem to be the privacy-preserving approach compared to having to do some refcounting and linking each message to the media (though I wouldn't guarantee that you couldn't correlate e.g. the filesize of the sticker, etc), but I fear it may be rubbish for UX, and especially on slow/mobile connections (where we also might want to avoid reuploading to reduce overall data transmitted). Finally, if you forward the message and then delete the forwarded copy, how could this be distinguished from deleting the original as it stands? Deeply conflicted here because I really wish the media store wasn't a monotonically growing area on my disk (without having to resort to heuristic methods like expiring media that hasn't been accessed in a while), but privacy is of course important. As an unrefined idea: Perhaps we could allow asking the homeserver to clone an existing piece of media to a new MXC URI? As a trade-off, my application might prefer to have the server do the copy (the actual implementation would likely be smarter than that — see hard links on a filesystem) and forward the 'new' MXC URI into a e.g. sticker event. The rest of your proposal can still stand — if I delete a message, its MXC URI can get unlinked alongside that. I've lost some privacy potentially (though it's arguable, since if you know what sticker pack I use in my conversations a lot, the filesize probably gives it away by equal measure), but it was my choice and I'm not stopping you or anyone else from having your reencrypt-on-copy implementation when that suits the trade-off better. |
||
|
||
### Unencrypted rooms | ||
|
||
It's common for MXC URLs to be shared between unencrypted events - e.g. | ||
reusing sticker media, or when forwarding messages between rooms, etc. In | ||
this instance, the homeserver (not media server) should count the references | ||
to a given MXC URL by events which refer to it (including state events such as | ||
avatar URLs in `m.room.membership` events.) | ||
|
||
If all events which refer to it have been purged or redacted, the HS should delete | ||
the attachment - either by internally deleting the media, or if using an | ||
external media repository, by calling the DELETE api upon it. | ||
|
||
If a new event is received over federation which refers to a deleted | ||
attachment, then the server should operate as if it has never heard of that | ||
attachment; pulling it in over federation from whatever the source server is. | ||
This will break if a remote server sends an event referring to a local | ||
MXC URL which may have been deleted, so don't do that - clients on servers | ||
should send MXC URLs which refer to their local server, not remote ones. | ||
|
||
This means that if the local server chooses to expire the source event sooner | ||
than a remote server does, the remote server might end up not being able to | ||
sync the media from the local server and so display a broken attachment. | ||
This feels fairly reasonable; if you don't want people to end up with 404s | ||
on attachments, you shouldn't go deleting things. | ||
|
||
In the scenario of (say) a redacted membership event, it's possible that the | ||
refcount of an unwanted avatar might be greater than zero (due to the avatar | ||
being referenced in multiple rooms), but the room admin may want to still | ||
purge the content from their server. This can be achieved by DELETEing the | ||
content independently from redacting the membership events. | ||
|
||
*N.B. we can't currently distinguish an E2EE attachment with unknown refering | ||
events, from a non-E2EE attachment with zero references which should be GCd. | ||
So we use mime-types as a heuristic to recognise E2EE attachments, and to stop | ||
them from being GC'd This would of course be vulnerable to an attacker lying | ||
about their mime-type in order to stop their repository entries being GC'd, | ||
but given E2EE attachments already let you bypass the GC, this doesn't feel | ||
like a big issue.* | ||
|
||
Encrypted attachments should be stored with a mime-type of | ||
`application/aes-encrypted` (to be registered), and attachments | ||
with this mime-type which have never been referenced by an event should | ||
be exempt from GC. For backwards compatibility, this rule may also be | ||
applied to attachments with mime-type of `application/octet-stream`. | ||
|
||
## Tradeoffs | ||
|
||
Assuming that encrypted events don't reuse attachments is controversial but | ||
hopefully acceptable. It does mean that stickers in encrypted rooms will end | ||
up getting re-encrypted/decrypted every time, but is hopefully acceptable | ||
given the resulting improvement in privacy. | ||
|
||
An alternative approach to solving the problem of attachment reuse could be to | ||
expect clients to somehow 'touch' uploaded local attachments whenever they | ||
send an event which refers to them - effectively renewing their retention | ||
lifetime. However, in E2EE rooms this ends up leaking which events refer to | ||
which attachments (or at least claim to), and also gives a vector for abuse | ||
where malicious client could bypass the retention schedule by repeatedly | ||
retouching a file to keep it alive. | ||
|
||
## Security considerations | ||
|
||
Media repo implementations might want to use `srm` or a similar secure | ||
deletion tool to scrub deleted data off disk. | ||
|
||
If the same attachment is sent multiple times across encrypted events (even if | ||
encrypted separately per event), it's worth noting that the size of the | ||
encrypted attachment and associated traffic patterns will be an easy way to | ||
identify attachment reuse (e.g. who's forwarding a sensitive file to each | ||
other). |
Uh oh!
There was an error while loading. Please reload this page.