Skip to content

Commit 84c9ffe

Browse files
authored
sled-agent-config-reconciler: actually call on_time_sync() (#8509)
@askfongjojo noticed that our `uptime`s on dogfood are all nonsense: ``` 8 BRM44220011 ok: 14:05:34 up 14066 day(s), 14:06, 1 user, load average: 5.54, 5.60, 5.62 9 BRM44220005 ok: 14:05:34 up 14066 day(s), 14:06, 1 user, load average: 17.51, 17.81, 17.81 10 BRM42220009 ok: 14:05:35 up 14066 day(s), 14:06, 1 user, load average: 15.00, 14.46, 14.01 11 BRM42220006 ok: 14:05:35 up 14066 day(s), 14:06, 0 users, load average: 11.49, 11.21, 11.10 12 BRM42220057 ok: 14:05:35 up 14066 day(s), 14:06, 0 users, load average: 2.88, 2.62, 2.03 ... ``` I _think_ #8064 introduced this. It shuffled around how time sync is checked and added a callback that the config-reconciler is supposed to run when it detects time has synchronized; that callback is responsible for rewriting `uptime` (among other things), but it never actually executes the callback. This PR fixes that. However, we have some racklettes that are running commits that include #8064 that have reasonable uptimes. I'm not sure how that's possible - is there some other way `uptime` can be correct if sled-agent doesn't fix it?
1 parent 38daa6e commit 84c9ffe

File tree

5 files changed

+14
-8
lines changed

5 files changed

+14
-8
lines changed

sled-agent/config-reconciler/src/reconciler_task.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,12 @@ impl ReconcilerTask {
466466
// and also we want to report it as part of each reconciler result).
467467
let timesync_status = self.zones.check_timesync().await;
468468

469+
// Call back into sled-agent and let it do any work that needs to happen
470+
// once time is sync'd (e.g., rewrite `uptime`).
471+
if timesync_status.is_synchronized() {
472+
sled_agent_facilities.on_time_sync();
473+
}
474+
469475
// We conservatively refuse to start any new zones if any zones have
470476
// failed to shut down cleanly. This could be more precise, but we want
471477
// to avoid wandering into some really weird cases, such as:

sled-agent/config-reconciler/src/reconciler_task/zones.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1129,7 +1129,7 @@ mod tests {
11291129
&self.underlay_vnic
11301130
}
11311131

1132-
async fn on_time_sync(&self) {}
1132+
fn on_time_sync(&self) {}
11331133

11341134
async fn start_omicron_zone(
11351135
&self,

sled-agent/config-reconciler/src/sled_agent_facilities.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ pub trait SledAgentFacilities: Send + Sync + 'static {
2727
// currently implemented by `ServiceManager` and does a couple one-time
2828
// setup things (like rewrite the OS boot time). We could probably absorb
2929
// that work and remove this callback.
30-
fn on_time_sync(&self) -> impl Future<Output = ()> + Send;
30+
fn on_time_sync(&self);
3131

3232
/// Method to start a zone.
3333
// TODO-cleanup This is implemented by

sled-agent/src/services.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3118,14 +3118,14 @@ impl ServiceManager {
31183118
///
31193119
/// This function only executes the out-of-band actions once, once the
31203120
/// synchronization state has shifted to true.
3121-
pub(crate) async fn on_time_sync(&self) {
3121+
pub(crate) fn on_time_sync(&self) {
31223122
if self
31233123
.inner
31243124
.time_synced
31253125
.compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed)
31263126
.is_ok()
31273127
{
3128-
debug!(self.inner.log, "Time is now synchronized");
3128+
info!(self.inner.log, "Time is now synchronized");
31293129
// We only want to rewrite the boot time once, so we do it here
31303130
// when we know the time is synchronized.
31313131
self.boottime_rewrite();
@@ -3138,11 +3138,11 @@ impl ServiceManager {
31383138
// https://github.com/oxidecomputer/omicron/issues/8022.
31393139
let queue = self.metrics_queue();
31403140
match queue.notify_time_synced_sled(self.sled_id()) {
3141-
Ok(_) => debug!(
3141+
Ok(_) => info!(
31423142
self.inner.log,
31433143
"Notified metrics task that time is now synced",
31443144
),
3145-
Err(e) => error!(
3145+
Err(e) => warn!(
31463146
self.inner.log,
31473147
"Failed to notify metrics task that \
31483148
time is now synced, metrics may not be produced.";

sled-agent/src/sled_agent.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1336,8 +1336,8 @@ impl SledAgentFacilities for ReconcilerFacilities {
13361336
&self.etherstub_vnic
13371337
}
13381338

1339-
async fn on_time_sync(&self) {
1340-
self.service_manager.on_time_sync().await
1339+
fn on_time_sync(&self) {
1340+
self.service_manager.on_time_sync();
13411341
}
13421342

13431343
async fn start_omicron_zone(

0 commit comments

Comments
 (0)