|
| 1 | +# MSC4222: Adding `state_after` to `/sync` |
| 2 | + |
| 3 | +The current [`/sync`](https://spec.matrix.org/v1.14/client-server-api/#get_matrixclientv3sync) API does not |
| 4 | +differentiate between state events in the timeline and updates to state, and so can cause the client's view |
| 5 | +of the current state of the room to diverge from the actual state of the room as seen by the server. |
| 6 | + |
| 7 | +The fundamental issue is that clients need to know the current authoritative room state, but the current model |
| 8 | +lacks an explicit representation of that. Clients derive state by assuming a linear application of events, for |
| 9 | +example: |
| 10 | + |
| 11 | +``` |
| 12 | +state_before + timeline => state_after |
| 13 | +``` |
| 14 | + |
| 15 | +However, room state evolves as a DAG (Directed Acyclic Graph), not a linear chain. A simple example illustrates: |
| 16 | +```diagram |
| 17 | + A |
| 18 | + | |
| 19 | + B |
| 20 | + / \ |
| 21 | + C D |
| 22 | +
|
| 23 | +``` |
| 24 | +Each of A, B, C, and D are non-conflicting state events. |
| 25 | +- State after C = `{A, B, C}` |
| 26 | +- State after D = `{A, B, D}` |
| 27 | +- Current state = `{A, B, C, D}` |
| 28 | + |
| 29 | +In this case, both C and D are concurrent, so the correct current state includes both. Clients that try to reconstruct |
| 30 | +state from a timeline such as `[A, B, C, D]` or `[A, B, D, C]` might trivially compute a union — and for non-conflicting |
| 31 | +cases, this works. |
| 32 | + |
| 33 | +However, once conflicting state enters, resolution is needed. Consider this more complex example: |
| 34 | +```diagram |
| 35 | + A |
| 36 | + | |
| 37 | + B |
| 38 | + / \ |
| 39 | + C C' <-- C' wins via state resolution |
| 40 | + \ / \ |
| 41 | + D E |
| 42 | +``` |
| 43 | +Here, C and C' are conflicting state events — for example, both might define a different `m.room.topic`. Let's say C' wins |
| 44 | +according to the server's state resolution rules. Then D and E are independent non-conflicting additions. |
| 45 | +- State after C = `{A, B, C}` |
| 46 | +- State after D = `{A, B, C'}` |
| 47 | +- State after E = `{A, B, C', E}` |
| 48 | +- Current state = `{A, B, C', D, E}` |
| 49 | + |
| 50 | +Now suppose the client first receives timeline events `[A, B, C', E]`. The state it constructs is: |
| 51 | +``` |
| 52 | +{A, B, C', E} ← Correct so far |
| 53 | +``` |
| 54 | +Then it receives a subsequent sync with timeline `[C, D]`, and the state block includes only `{B}`. Under the current |
| 55 | +`/sync` behavior: |
| 56 | +- The timeline includes state event C, which incorrectly replaces C'. |
| 57 | +- The client ends up with `{A, B, C, D, E}`, which is **invalid** — it prefers the wrong version of C. |
| 58 | +This happens because the client re-applies C from the timeline, unaware that C' had already been resolved and accepted |
| 59 | +earlier. There's no way for the client to know that C' is supposed to win, based solely on the timeline. |
| 60 | + |
| 61 | +In [MSC4186 - Simplified Sliding Sync](https://github.com/matrix-org/matrix-spec-proposals/pull/4186) this problem is |
| 62 | +solved by the equivalent `required_state` section including all state changes between the previous sync and the end of |
| 63 | +the current sync, and clients do not update their view of state based on entries in the timeline. |
| 64 | + |
| 65 | + |
| 66 | +## Proposal |
| 67 | + |
| 68 | +This change is gated behind the client adding a `?use_state_after=true` (the unstable name is |
| 69 | +`org.matrix.msc4222.use_state_after`) query param. |
| 70 | + |
| 71 | +When enabled, the Homeserver will **omit** the `state` section in the room response sections. This is replaced by |
| 72 | +`state_after` (the unstable field name is `org.matrix.msc4222.state_after`), which will include all state changes between the |
| 73 | +previous sync and the *end* of the timeline section of the current sync. This is in contrast to the old `state` section |
| 74 | +that only included state changes between the previous sync and the *start* of the timeline section. Note that this does |
| 75 | +mean that a new state event will (likely) appear in both the timeline and state sections of the response. |
| 76 | + |
| 77 | +This is basically the same as how state is returned in [MSC4186 - Simplified Sliding |
| 78 | +Sync](https://github.com/matrix-org/matrix-spec-proposals/pull/4186). |
| 79 | + |
| 80 | +Clients **MUST** only update their local state using `state_after` and **NOT** consider the events that appear in the timeline section of `/sync`. |
| 81 | + |
| 82 | +Clients can tell if the server supports this change by whether it returns a `state` or `state_after` section in the |
| 83 | +response. Servers that support this change **MUST** return the `state_after` property, even if empty. |
| 84 | + |
| 85 | +### Examples |
| 86 | + |
| 87 | +#### Example 1 \- Common case |
| 88 | + |
| 89 | +Let’s take a look at the common case of a state event getting sent down an incremental sync, which is non-gappy. |
| 90 | + |
| 91 | +<table> |
| 92 | +<tr><th>Previously</th><th>Proposed</th></tr> |
| 93 | +<tr> |
| 94 | +<td> |
| 95 | + |
| 96 | +```json |
| 97 | +{ |
| 98 | + "timeline": { |
| 99 | + "events": [ { |
| 100 | + "type": "org.matrix.example", |
| 101 | + "state_key": "" |
| 102 | + } ], |
| 103 | + "limited": false, |
| 104 | + }, |
| 105 | + "state": { |
| 106 | + "events": [] |
| 107 | + } |
| 108 | +} |
| 109 | +``` |
| 110 | + |
| 111 | +</td> |
| 112 | +<td> |
| 113 | + |
| 114 | +```json |
| 115 | +{ |
| 116 | + "timeline": { |
| 117 | + "events": [ { |
| 118 | + "type": "org.matrix.example", |
| 119 | + "state_key": "" |
| 120 | + } ], |
| 121 | + "limited": false, |
| 122 | + }, |
| 123 | + "state_after": { |
| 124 | + "events": [ { |
| 125 | + "type": "org.matrix.example", |
| 126 | + "state_key": "" |
| 127 | + } ] |
| 128 | + } |
| 129 | +} |
| 130 | +``` |
| 131 | + |
| 132 | +</td> |
| 133 | +</tr> |
| 134 | +</table> |
| 135 | + |
| 136 | +Since the current state of the room will include the new state event, it's included in the `state_after` section. |
| 137 | + |
| 138 | +> [!NOTE] |
| 139 | +> In the proposed API the state event comes down both in the timeline section *and* the state section. |
| 140 | +
|
| 141 | + |
| 142 | +#### Example 2 - Receiving “outdated” state |
| 143 | + |
| 144 | +Next, let’s look at what would happen if we receive a state event that does not take effect, i.e. that shouldn’t cause the client to update its state. |
| 145 | + |
| 146 | +<table> |
| 147 | +<tr><th>Previously</th><th>Proposed</th></tr> |
| 148 | +<tr> |
| 149 | +<td> |
| 150 | + |
| 151 | +```json |
| 152 | +{ |
| 153 | + "timeline": { |
| 154 | + "events": [ { |
| 155 | + "type": "org.matrix.example", |
| 156 | + "state_key": "" |
| 157 | + } ], |
| 158 | + "limited": false, |
| 159 | + }, |
| 160 | + "state": { |
| 161 | + "events": [] |
| 162 | + } |
| 163 | +} |
| 164 | +``` |
| 165 | + |
| 166 | +</td> |
| 167 | +<td> |
| 168 | + |
| 169 | +```json |
| 170 | +{ |
| 171 | + "timeline": { |
| 172 | + "events": [ { |
| 173 | + "type": "org.matrix.example", |
| 174 | + "state_key": "" |
| 175 | + } ], |
| 176 | + "limited": false, |
| 177 | + }, |
| 178 | + "state_after": { |
| 179 | + "events": [] |
| 180 | + } |
| 181 | +} |
| 182 | +``` |
| 183 | + |
| 184 | +</td> |
| 185 | +</tr> |
| 186 | +</table> |
| 187 | + |
| 188 | +Since the current state of the room does not include the new state event, it's excluded from the `state_after` section. |
| 189 | + |
| 190 | +> [!IMPORTANT] |
| 191 | +> Even though both responses look very similar, the client **MUST NOT** update its state with the event from the timeline section when using `state_after`. |
| 192 | +
|
| 193 | + |
| 194 | +## Potential issues |
| 195 | + |
| 196 | +With the proposed API the common case for receiving a state update will cause the event to come down in both the |
| 197 | +`timeline` and `state_after` sections, potentially increasing bandwidth usage. However, it is common for the HTTP responses to |
| 198 | +be compressed, heavily reducing the impact of having duplicated data. |
| 199 | + |
| 200 | +Both before and after this proposal, clients are not able to calculate reliably exactly when in the |
| 201 | +timeline the state changed (e.g. to figure out which message should show a user's previous/updated |
| 202 | +display name - note that some clients e.g. Element have moved away from this UX). This is because |
| 203 | +the accurate picture of the current state at an event is calculated by the server based on the room |
| 204 | +DAG, including the state resolution process, and not based on a linear list of state updates. |
| 205 | + |
| 206 | +This proposal ensures that the client has a more accurate view of the room state *after the sync has |
| 207 | +finished*, but it does not provide any more information about the *history of state* as it relates |
| 208 | +to events in the timeline. Clients attempting to build a best-effort view of this history by walking |
| 209 | +the timeline may still do so, with the same caveats as before about correctness, but they should be |
| 210 | +sure to make their view of the final state consistent with the changes provided in `state_after`. |
| 211 | + |
| 212 | +The format of returned state in `state_after` in this proposal is a list of events. This |
| 213 | +does not allow the server to indicate if an entry has been removed from the state. As with |
| 214 | +[MSC4186 - Simplified Sliding Sync](https://github.com/matrix-org/matrix-spec-proposals/pull/4186), |
| 215 | +this limitation is acknowledged but not addressed here. This is not a new issue and is left for |
| 216 | +resolution in a future MSC. |
| 217 | + |
| 218 | + |
| 219 | +## Alternatives |
| 220 | + |
| 221 | +There are a number of options for encoding the same information in different ways, for example the response could |
| 222 | +include both the `state` and a `state_delta` section, where `state_delta` would be any changes that needed to be applied |
| 223 | +to the client calculated state to correct it. However, since |
| 224 | +[MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186) is likely to replace the current `/sync` API, we may as |
| 225 | +well use the same mechanism. This also has the benefit of showing that the proposed API shape can be successfully |
| 226 | +implemented by clients, as the MSC is implemented and in use by clients. |
| 227 | + |
| 228 | +Another option would be for server implementations to try and fudge the state and timeline responses to ensure that |
| 229 | +clients came to the correct view of state. For example, if the server detects that a sync response will cause the client |
| 230 | +to come to an incorrect view of state it could either a) "fixup" the state in the `state` section of the *next* sync |
| 231 | +response, or b) remove or add old state events to the timeline section. While both these approaches are viable, they're |
| 232 | +both suboptimal to just telling the client the correct information in the first place. Since clients will need to be |
| 233 | +updated to handle the new behavior for future sync APIs anyway, there is little benefit from not updating clients now. |
| 234 | + |
| 235 | +We could also do nothing, and instead wait for [MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186) |
| 236 | +(or equivalent) to land and for clients to update to it. |
| 237 | + |
| 238 | + |
| 239 | +## Security considerations |
| 240 | + |
| 241 | +There are no security concerns with this proposal, as it simply encodes the same information sent to clients in a |
| 242 | +different way |
| 243 | + |
| 244 | +## Unstable prefix |
| 245 | + |
| 246 | +| Name | Stable prefix | Unstable prefix | |
| 247 | +| - | - | - | |
| 248 | +| Query param | `use_state_after` | `org.matrix.msc4222.use_state_after` | |
| 249 | +| Room response field | `state_after` | `org.matrix.msc4222.state_after` | |
| 250 | + |
| 251 | +## Dependencies |
| 252 | + |
| 253 | +None |
0 commit comments