Skip to content

Commit 211b670

Browse files
committed
flesh out SELECT section
1 parent e981f88 commit 211b670

File tree

1 file changed

+52
-3
lines changed

1 file changed

+52
-3
lines changed

doc/developer/design/20250717_a_small_coordinator_more_scalable_isolated_materialize.md

Lines changed: 52 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,6 @@ is, a Coordinator that is less involved in processing user requests. This will
1313
lead to a more scalable _and_ more isolated system, without grander ambitions
1414
of implementing full horizontal scalability and isolation of Materialize.
1515

16-
- Decoupled Compute Controller: https://github.com/MaterializeInc/materialize/pull/29559
17-
- decoupled storage controller: [decoupled storage controller](20240117_decoupled_storage_controller.md)
18-
1916
## The Problem
2017

2118
The Coordinator is a component of the ADAPTER layer that is sequentializing
@@ -195,14 +192,66 @@ channels, etc., we can say that this is currently the bottleneck:
195192

196193
## Proposal
197194

195+
As is likely clear by now, we propose to work towards a Small Coordinator. We
196+
propose to start that work _now_ because of the urgency, but do it
197+
incrementally, so that over time we will arrive at the goal of a Small
198+
Coordinator. We will not outline a comprehensive step-by-step implementation
199+
plan but instead we will provide examples of workflows and how we can move from
200+
big to small and then provide a rough implementation plan for immediate next
201+
steps.
202+
203+
Overall, we should use a data-driven approach: we can use the
204+
message-processing metrics to find where we spend time on the Coordinator main
205+
loop, both at steady state and when processing certain important or
206+
representative workloads. And then we tackle those usages of the loop and the
207+
associated commands. Additionally, we can lean into recency bias and let bugs
208+
or observations of lack of isolation guide what other parts we need to address.
209+
210+
### Processing SELECTs
211+
212+
For SELECT, the bulk of the work is currently driven by the main loop. The
213+
frontend sends an EXECUTE SELECT message and the coordinator then does the
214+
sequencing, firing off of staged tasks, and talking to the controller(s). This
215+
diagram visualizes the workflow for the current Big Coordinator. We can clearly
216+
see that a lot of time is spent on the main loop:
217+
198218
<img src="./static/a_small_coordinator/big-coord-select.png" alt="Big Coordinator - processing SELECT" width="50%">
219+
220+
We propose this approach for moving towards a Small Coordinator:
221+
222+
- Determine what bundles of data and access is needed for processing SELECT.
223+
- Introduce interfaces/clients (or re-use existing ones) and small commands
224+
that allow the frontend to get them from the Coordinator.
225+
- Move main driver code for executing SELECT to the frontend. And it uses the
226+
new interfaces and commands to retrieve what it needs.
227+
228+
Concretely, we think for SELECT the required interfaces and commands are:
229+
230+
- `get_catalog_snapshot` -> `Catalog` (exists)
231+
- `get_timestamp_oracle` -> `TimestampOracle` (exists)
232+
- `get_controller_clients` -> `ComputeControllerClient` (roughly exists already
233+
after [PR: decoupled compute
234+
controller](https://github.com/MaterializeInc/materialize/pull/29559)),
235+
`StorageCollections` (exists, from [design: decoupled storage
236+
controller](20240117_decoupled_storage_controller.md))
237+
238+
The workflow after those changes will look like this. The work is "pushed out"
239+
from the main loop to the frontend (which already has a task/thread per
240+
connection), and the controller. Much less time is spent on the coordinator
241+
main loop:
242+
199243
<img src="./static/a_small_coordinator/small-coord-select.png" alt="Small Coordinator - processing SELECT" width="50%">
200244

245+
### Controller Processing
246+
201247
<img src="./static/a_small_coordinator/big-coord-controller.png" alt="Big Coordinator - controller processing" width="50%">
202248
<img src="./static/a_small_coordinator/small-coord-controller.png" alt="Small Coordinator - controller processing" width="50%">
203249

204250
### Implementation Plan
205251

252+
- Decoupled Compute Controller: https://github.com/MaterializeInc/materialize/pull/29559
253+
- decoupled storage controller: [decoupled storage controller](20240117_decoupled_storage_controller.md)
254+
206255
## Alternatives
207256

208257
An alternative is that we keep the big Coordinator and invest more into staged

0 commit comments

Comments
 (0)