doc/design: A Small Coordinator for a More Scalable and Isolated Materialize #33082

aljoscha · 2025-07-18T15:16:27Z

Rendered: https://github.com/aljoscha/materialize/blob/design-small-coordinator/doc/developer/design/20250717_a_small_coordinator_more_scalable_isolated_materialize.md

teskje · 2025-07-22T10:09:28Z

doc/developer/design/20250717_a_small_coordinator_more_scalable_isolated_materialize.md

+has too be involved when absolutely necessary. A good analogy might be CISC vs
+RISC instruction sets, where CISC has fewer, more complex opcodes and RISC has
+possibly more, but simpler opcodes.


Nit: I think you mixed up the analogy here! CISC ISAs usually have more opcodes than RISC ISAs.

I think you do want the coordinator to be "RISC" in terms of the complexity (less) and number (fewer) of commands. But RISC machines also have to execute more commands than CISC machines to implement the same logic, whereas I think we also want the coordinator to execute fewer commands than it does now.

Yeah, I messed up that one part. CISC is actually more opcodes and the opcodes are more complicated. So the analogy works quite well.

Where before the frontend would send 1 EXECUTE SELECT, it would send the smaller GET CATALOG, GET THIS CLIENT, GET THAT CLIENT messages. But the frontend would also keep those clients around so doesn't have to run those commands for every peek.

teskje · 2025-07-22T10:23:09Z

doc/developer/design/20250717_a_small_coordinator_more_scalable_isolated_materialize.md

+- `controller_ready(compute)`: the compute controller signaling that a peek
+  result is ready and the Coordinator needs to act.


Both controllers also become ready when frontiers advance. Probably doesn't show up here because it gets drowned out by the peek responses, but it is also something we should fix. The reason we wake up on all frontier changes is that there might have been watches registered for query lifecycle tracking and we need to check those somewhere. But we shouldn't check them on the coordinator main loop.

teskje · 2025-07-22T10:37:13Z

doc/developer/design/20250717_a_small_coordinator_more_scalable_isolated_materialize.md

+controller](https://github.com/MaterializeInc/materialize/pull/29559). With
+that work fully realized, both for the storage controller and the remaining
+compute controller moments a visualization of the workflow would look like
+this:


I think it already basically looks like this for the compute controller. At least it doesn't really do anything on process, just maybe sends a "run maintenance" command to the instance tasks and then returns a stashed response, if it has any.

Sending the "run maintenance" command is cheap. But we can also remove it, I think, by giving each instance task its own maintenance ticker.

ggevay · 2025-07-22T14:14:10Z

doc/developer/design/20250717_a_small_coordinator_more_scalable_isolated_materialize.md

+  layer.
+- We don't want to improve throughput benchmark numbers, only remove the
+  Coordinator as a bottleneck. Our work might increase throughput numbers or
+  reveal similar bottlenecks in other parts of the system.


I think moving the current work of peek sequencing to the frontend could increase QPS even without horizontal scaling, because:

Doing most of the peek sequencing work on the per-session frontend task would allow us to at least vertically scale envd inside a single machine when a particular user wants a bit more QPS.

I imagine that the current staged execution has some overhead, which would disappear if we had straight-line code instead.

aljoscha force-pushed the design-small-coordinator branch 9 times, most recently from 81383b7 to 211b670 Compare July 21, 2025 10:12

aljoscha changed the title ~~DRAFT: A Small Coordinator for A More Scalable and Isolated Materialize~~ doc/design: A Small Coordinator for A More Scalable and Isolated Materialize Jul 21, 2025

aljoscha changed the title ~~doc/design: A Small Coordinator for A More Scalable and Isolated Materialize~~ doc/design: A Small Coordinator for a More Scalable and Isolated Materialize Jul 21, 2025

aljoscha added 12 commits July 21, 2025 21:24

doc/design: add "Finally, A Small Coordinator" design doc

b24e1a9

more writing

e2ea582

more things, rescale images

d02a6a6

more small fixes

8aadb9c

more stuff

d032133

flesh out SELECT section

6e2410c

more controller business

62f1b0d

open questsion

4183a39

implementation plan

5f6dd07

refine

0a4427d

better phrasing, grammar, the like

43d6b90

note on horizontal scalability

f16247d

aljoscha force-pushed the design-small-coordinator branch from d833825 to f16247d Compare July 21, 2025 19:24

aljoscha marked this pull request as ready for review July 21, 2025 19:24

teskje reviewed Jul 22, 2025

View reviewed changes

fix CISC RISC analogy

e156077

ggevay reviewed Jul 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

doc/design: A Small Coordinator for a More Scalable and Isolated Materialize #33082

doc/design: A Small Coordinator for a More Scalable and Isolated Materialize #33082

Uh oh!

aljoscha commented Jul 18, 2025 •

edited

Loading

Uh oh!

teskje Jul 22, 2025

Uh oh!

aljoscha Jul 22, 2025

Uh oh!

teskje Jul 22, 2025

Uh oh!

teskje Jul 22, 2025

Uh oh!

ggevay Jul 22, 2025

Uh oh!

Uh oh!

		- `controller_ready(compute)`: the compute controller signaling that a peek
		result is ready and the Coordinator needs to act.

doc/design: A Small Coordinator for a More Scalable and Isolated Materialize #33082

Are you sure you want to change the base?

doc/design: A Small Coordinator for a More Scalable and Isolated Materialize #33082

Uh oh!

Conversation

aljoscha commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teskje Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

aljoscha Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

ggevay Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aljoscha commented Jul 18, 2025 •

edited

Loading