@@ -13,9 +13,6 @@ is, a Coordinator that is less involved in processing user requests. This will
13
13
lead to a more scalable _ and_ more isolated system, without grander ambitions
14
14
of implementing full horizontal scalability and isolation of Materialize.
15
15
16
- - Decoupled Compute Controller: https://github.com/MaterializeInc/materialize/pull/29559
17
- - decoupled storage controller: [ decoupled storage controller] ( 20240117_decoupled_storage_controller.md )
18
-
19
16
## The Problem
20
17
21
18
The Coordinator is a component of the ADAPTER layer that is sequentializing
@@ -195,14 +192,65 @@ channels, etc., we can say that this is currently the bottleneck:
195
192
196
193
## Proposal
197
194
195
+ As is likely clear by now, we propose to work towards a Small Coordinator. We
196
+ propose to start that work _ now_ because of the urgency, but do it
197
+ incrementally, so that over time we will arrive at the goal of a Small
198
+ Coordinator. We will not outline a comprehensive step-by-step implementation
199
+ plan but instead we will provide examples of workflows and how we can move from
200
+ big to small and then provide a rough implementation plan for immediate next
201
+ steps.
202
+
203
+ Overall, we should use a data-driven approach: we can use the
204
+ message-processing metrics to find where we spend time on the Coordinator main
205
+ loop, both at steady state and when processing certain important or
206
+ representative workloads. And then we tackle those usages of the loop and the
207
+ associated commands. Additionally, we can lean into recency bias and let bugs
208
+ or observations of lack of isolation guide what other parts we need to address.
209
+
210
+ ### Processing SELECTs
211
+
212
+ For SELECT, the bulk of the work is currently driven by the main loop. The
213
+ frontend sends an EXECUTE SELECT message and the coordinator then does the
214
+ sequencing, firing off of staged tasks, and talking to the controller(s). This
215
+ diagram visualizes the workflow for the current Big Coordinator. We can clearly
216
+ see that a lot of time is spent on the main loop:
217
+
198
218
<img src =" ./static/a_small_coordinator/big-coord-select.png " alt =" Big Coordinator - processing SELECT " width =" 50% " >
219
+
220
+ We propose this approach for moving towards a Small Coordinator:
221
+
222
+ - Determine what bundles of data and access is needed for processing SELECT.
223
+ - Introduce interfaces/clients (or re-use existing ones) and small commands
224
+ that allow the frontend to get them from the Coordinator.
225
+ - Move main driver code for executing SELECT to the frontend. And it uses the
226
+ new interfaces and commands to retrieve what it needs.
227
+
228
+ Concretely, we think for SELECT the required interfaces and commands are:
229
+
230
+ - ` get_catalog_snapshot ` -> ` Catalog ` (exists)
231
+ - ` get_timestamp_oracle ` -> ` TimestampOracle ` (exists)
232
+ - ` get_controller_clients ` -> ` ComputeControllerClient ` (roughly exists already
233
+ after [ PR #29559 ] ( https://github.com/MaterializeInc/materialize/pull/29559 ) ),
234
+ ` StorageCollections ` (exists, from [ decoupled storage
235
+ controller] ( 20240117_decoupled_storage_controller.md ) )
236
+
237
+ The workflow after those changes will look like this. The work is "pushed out"
238
+ from the main loop to the frontend (which already has a task/thread per
239
+ connection), and the controller. Much less time is spent on the coordinator
240
+ main loop:
241
+
199
242
<img src =" ./static/a_small_coordinator/small-coord-select.png " alt =" Small Coordinator - processing SELECT " width =" 50% " >
200
243
244
+ ### Controller Processing
245
+
201
246
<img src =" ./static/a_small_coordinator/big-coord-controller.png " alt =" Big Coordinator - controller processing " width =" 50% " >
202
247
<img src =" ./static/a_small_coordinator/small-coord-controller.png " alt =" Small Coordinator - controller processing " width =" 50% " >
203
248
204
249
### Implementation Plan
205
250
251
+ - Decoupled Compute Controller: https://github.com/MaterializeInc/materialize/pull/29559
252
+ - decoupled storage controller: [ decoupled storage controller] ( 20240117_decoupled_storage_controller.md )
253
+
206
254
## Alternatives
207
255
208
256
An alternative is that we keep the big Coordinator and invest more into staged
0 commit comments