@@ -14,6 +14,9 @@ each described by a zero-sized struct that can be debug printed as necessary.
14
14
You can see a the scheme description, represented as `V0Scheme`, for trust
15
15
quorum version 0 in link:./src/schemes/v0/mod.rs[].
16
16
17
+ The bootstore also provides a mechanism for replicating and storing early
18
+ networking configuration required to bring up the rest of the control plane.
19
+ This is implemented in the `Node` type, at a layer above the LRTQ FSM.
17
20
18
21
== Rack Secret Generation and Share distribution
19
22
@@ -23,20 +26,13 @@ all live under the link:./src/trust_quorum[] directory. We anticipate that
23
26
the `RackSecret` generation algorithms will remain the same across all schemes
24
27
for the foreseeable future. This will not be the case for the `packages` that
25
28
distribute key shares and metadata to different sleds, and so we version these.
26
- We may decide to move these "packages" into the specific scheme subdirectories
27
- because of this, but that is not done in the current code as written. The
28
- code is divided across a somewhat artificial line of protocol, provided in the
29
- schemes, and data, provided by the packages.
30
-
31
- While not yet done, the rack secret will be used as input key material to the
32
- link:../key-manager[] so that disks can be decrypted across sleds to allow the
33
- rack to boot.
34
-
35
- Thus while the schemes to distribute key shares and rotate and reconstruct the
36
- rack secret evolve to become more secure over time, the code using the rack
37
- secret can be kept consistent as it is just fed the same shape of input key
38
- material with each iteration.
39
29
30
+ The rack secret is used as input key material to the link:../key-manager[] so
31
+ that disks can be decrypted across sleds to allow the rack to boot. This allows
32
+ the schemes to distribute key shares and rotate and reconstruct the rack secret
33
+ to evolve and become more secure over time. It also enables the code using the
34
+ rack secret to be kept consistent as long as it is fed the same shape of input
35
+ key material.
40
36
41
37
== Scheme V0
42
38
@@ -50,45 +46,6 @@ performs no I/O. This allows us to write property based tests that simulate an
50
46
entire cluster of sleds operating in concert and ensure that any test failures
51
47
shrink correctly.
52
48
53
- === Future work
54
-
55
- The code that implements network IO is not currently written. It will be a thin
56
- shim of async code on top of the v0 `Fsm`. For this protocol to work, peer sled
57
- bootstrap addresses will be discovered via DDM, and persistent TCP connections
58
- established using a "downstream" strategy, where sleds with higher IP addresses
59
- connect to those with lower IP addresses. The first thing peers do when they
60
- connect to each other is exchange `Hello` messages to identify the scheme and
61
- protocol version. This is described in link:./src/schemes/mod.rs[], although
62
- it is not relevant to the FSM which operates solely inside the V0 scheme.
63
- After protocol negotiation each sides sends an `Identify` message as described
64
- in link:./src/schemes/v0/messages.rs[]. This servers to inform each sled the
65
- identity of its peer. The `Baseboard` serves as the unique identity for a peer
66
- and does not change over the lifetime of a sled. This message is only sent over
67
- the network links and is used by the networking layer to map TCP connections
68
- to peer IDs. The protocol FSM itself, however, is agnostic to the network layer
69
- other than knowing it uses reliable stream based communication. Protocol
70
- messages are only addressed via `Baseboard` and then routed appropriately by the
71
- network layer.
72
-
73
- The persistence of FSM state, shares, and metadata is also not yet implemented.
74
- We anticipate making the `PersistentState` or a serialized version of
75
- it `Ledgerable` and reusing the `Ledger` code in
76
- link:../sled-agent/src/ledger.rs[]. For this to work, we'll have to extract the
77
- ledger code from `sled-agent` and put it in `omicron-common`.
78
-
79
- Once the network layer and persistence are implemented, the bootstore itself
80
- will be complete for scheme v0. We will then need to plug it into the bootstrap
81
- agent so that that sleds can communicate and recompute a rack secret. To decrypt
82
- the drives, we will need to create a `KeyRetriever`
83
- (see link:../key-manager/src/lib.rs[]) and plug that into `KeyManager`
84
- construction in the bootstrap agent. And of course, RSS will have to trigger the
85
- bootstore to generate the rack secret and and distribute the key share packages
86
- in the first place.
87
-
88
- The previous 3 paragraphs describe future work, although smaller and simpler
89
- than the bootstore protocol itself (hopefully). The remainder of this section
90
- will discuss the v0 Scheme.
91
-
92
49
=== Threat Model and Security Goals
93
50
94
51
The trust quorum v0 scheme is limited in capability by the fact that a lot of
@@ -133,7 +90,7 @@ provide a key for each dataset on day 1. The question then becomes where do we
133
90
get the input key material from. We essentially had 4 options:
134
91
135
92
. Hardcode it - this is what is currently committed as the
136
- `LocalSecretRetreiver `
93
+ `HardcodedSecretRetreiver `
137
94
. Store random values constructed at RSS time on the M.2s and use them locally
138
95
. Derive keys from VPD data unique to a sled
139
96
. Build a simplified trust quorum scheme over untrusted channels on the
@@ -220,12 +177,12 @@ Learners will rotate through known peers until they find one that has a share.
220
177
221
178
=== Testing strategy
222
179
223
- The primary method of testing is generative testing via https://proptest-
224
- rs.github.io/proptest/intro.html[proptest]. There are two property based tests:
225
- one for running as the `rack_coordinator` and one for running as a `learner`.
226
- Once the initial setup is performed to either initialize the rack, or learn
227
- a `LearnedSharePkg`, the tests largely share the same behavior in terms of
228
- processing generated `Action`s.
180
+ The primary method of testing is generative testing via
181
+ https://proptest- rs.github.io/proptest/intro.html[proptest]. There are two
182
+ property based tests: one for running as the `rack_coordinator` and one for
183
+ running as a `learner`. Once the initial setup is performed to either initialize
184
+ the rack, or learn a `LearnedSharePkg`, the tests largely share the same
185
+ behavior in terms of processing generated `Action`s.
229
186
230
187
One important thing to note is in regard to message responses. We always send
231
188
responses to a request from the system under test (SUT) peer when a peer is
0 commit comments