Skip to content

Commit a39a1a9

Browse files
authored
nexus: use all CockroachDB hosts from DNS to create DB connection URL. (#3783)
First pass at #3763 for crdb. Even though we did query internal DNS, we were previously using only a single host as part of connecting to crdb from Nexus. And since the internal DNS server always returns records in the same order, that meant every Nexus instance was always using the same CockroachDB instance even now that we've been provisioning multiple. This also meant if that CRDB instance went down we'd be hosed (as seen in #3763). To help with that, this PR changes Nexus to use all the CRDB hosts reported via Internal DNS when creating the connection URL. There are some comments in the code, but this still not quite as robust as we could be, but short of something cueball-like it's still an improvement. To test I disabled the initial crdb nexus connected to and it was able to recover by connecting to the next crdb instance and continue serving requests. From the log we can see a successful query, connection errors once i disabled `fd00:1122:3344:101::5`, and then a successful query with connection reestablished to next crdb instance (`fd00:1122:3344:101::3`): ``` 23:43:24.729Z DEBG 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): authorize result action = Query actor = Some(Actor::UserBuiltin { user_builtin_id: 001de000-05e4-4000-8000-000000000003, .. }) resource = Database result = Ok(()) 23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d isable error_message = Connection error: server is shutting down 23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d isable error_message = Connection error: server is shutting down 23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d isable error_message = Connection error: server is shutting down 23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d isable error_message = Connection error: server is shutting down 23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d isable error_message = Connection error: server is shutting down 23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d isable error_message = Connection error: server is shutting down 23:43:24.730Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d isable error_message = Connection error: server is shutting down 23:43:24.730Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d isable error_message = Connection error: server is shutting down 23:43:30.803Z DEBG 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): roles roles = RoleSet { roles: {} } 23:43:30.804Z DEBG 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): authorize result action = Query actor = Some(Actor::UserBuiltin { user_builtin_id: 001de000-05e4-4000-8000-000000000003, .. }) resource = Database result = Ok(()) ```
1 parent e53de82 commit a39a1a9

File tree

1 file changed

+16
-6
lines changed

1 file changed

+16
-6
lines changed

nexus/src/context.rs

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -168,16 +168,21 @@ impl ServerContext {
168168
nexus_config::Database::FromUrl { url } => url.clone(),
169169
nexus_config::Database::FromDns => {
170170
info!(log, "Accessing DB url from DNS");
171-
let address = loop {
171+
// It's been requested but unfortunately not supported to directly
172+
// connect using SRV based lookup.
173+
// TODO-robustness: the set of cockroachdb hosts we'll use will be
174+
// fixed to whatever we got back from DNS at Nexus start. This means
175+
// a new cockroachdb instance won't picked up until Nexus restarts.
176+
let addrs = loop {
172177
match resolver
173-
.lookup_socket_v6(ServiceName::Cockroach)
178+
.lookup_all_socket_v6(ServiceName::Cockroach)
174179
.await
175180
{
176-
Ok(address) => break address,
181+
Ok(addrs) => break addrs,
177182
Err(e) => {
178183
warn!(
179184
log,
180-
"Failed to lookup cockroach address: {e}"
185+
"Failed to lookup cockroach addresses: {e}"
181186
);
182187
tokio::time::sleep(std::time::Duration::from_secs(
183188
1,
@@ -186,9 +191,14 @@ impl ServerContext {
186191
}
187192
}
188193
};
189-
info!(log, "DB address: {}", address);
194+
let addrs_str = addrs
195+
.iter()
196+
.map(ToString::to_string)
197+
.collect::<Vec<_>>()
198+
.join(",");
199+
info!(log, "DB addresses: {}", addrs_str);
190200
PostgresConfigWithUrl::from_str(&format!(
191-
"postgresql://root@{address}/omicron?sslmode=disable",
201+
"postgresql://root@{addrs_str}/omicron?sslmode=disable",
192202
))
193203
.map_err(|e| format!("Cannot parse Postgres URL: {}", e))?
194204
}

0 commit comments

Comments
 (0)