Skip to content

Create a new NTP-admin service #8555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Create a new NTP-admin service #8555

wants to merge 1 commit into from

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Jul 8, 2025

This exposes the same access to chrony that exists in config-reconciler/src/reconciler_task/zones.rs, but over an HTTP interface instead.

Fixes #8552

@smklein smklein force-pushed the ntp-admin branch 5 times, most recently from ee07393 to d2e15ed Compare July 10, 2025 22:42
}

fn main() {
if let Err(err) = oxide_tokio_rt::run(main_impl()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think this service really needs 128 worker threads, or should we maybe consider building a single-threaded runtime here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not, but I was just copying the structure of the cockroach-admin server (which is also apparently launching 128 threads -- I doubt it needs that many)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use 8

}

let Ok(ref_id) = u32::from_str_radix(v[0], 16) else {
error!(log, "failed to parse ref_id");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be worthwhile to include the invalid value here (and subsequently)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

@smklein smklein marked this pull request as ready for review July 11, 2025 19:46
Copy link
Contributor

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! There are a few chronyc-specific bits that I don't really know about, so I just took a look at the other parts of this PR :)

Comment on lines 61 to 109
if v.len() < 10 {
error!(log, "too few fields"; "stdout" => &stdout);
return Err(TimeSyncError::FailedToParse {
reason: "too few fields",
stdout,
});
}

let Ok(ref_id) = u32::from_str_radix(v[0], 16) else {
error!(log, "failed to parse ref_id"; "stdout" => &stdout);
return Err(TimeSyncError::FailedToParse {
reason: "bad ref_id",
stdout,
});
};
let ip_addr =
IpAddr::from_str(v[1]).unwrap_or(Ipv6Addr::UNSPECIFIED.into());
let Ok(stratum) = u8::from_str(v[2]) else {
error!(log, "bad stratum"; "stdout" => &stdout);
return Err(TimeSyncError::FailedToParse {
reason: "bad stratum",
stdout,
});
};
let Ok(ref_time) = f64::from_str(v[3]) else {
error!(log, "bad ref_time"; "stdout" => &stdout);
return Err(TimeSyncError::FailedToParse {
reason: "bad ref_time",
stdout,
});
};
let Ok(correction) = f64::from_str(v[4]) else {
error!(log, "bad correction"; "stdout" => &stdout);
return Err(TimeSyncError::FailedToParse {
reason: "bad correction",
stdout,
});
};

// Per `chronyc waitsync`'s implementation, if either the
// reference IP address is not unspecified or the reference
// ID is not 0 or 0x7f7f0101, we are synchronized to a peer.
let peer_sync =
!ip_addr.is_unspecified() || (ref_id != 0 && ref_id != 0x7f7f0101);

let sync = stratum < 10
&& ref_time > 1234567890.0
&& peer_sync
&& correction.abs() <= 0.05;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be good to get some tests for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is largely being lifted from

// XXXNTP - This could be replaced with a direct connection to the
// daemon using a patched version of the chrony_candm crate to allow
// a custom server socket path. From the GZ, it should be possible to
// connect to the UNIX socket at
// format!("{}/var/run/chrony/chronyd.sock", ntp_zone.root())
let stdout = running_ntp_zone
.run_cmd(&["/usr/bin/chronyc", "-c", "tracking"])
.map_err(TimeSyncError::ExecuteChronyc)?;
let v: Vec<&str> = stdout.split(',').collect();
if v.len() < 10 {
return Err(TimeSyncError::FailedToParse {
reason: "too few fields",
stdout,
});
}
let Ok(ref_id) = u32::from_str_radix(v[0], 16) else {
return Err(TimeSyncError::FailedToParse {
reason: "bad ref_id",
stdout,
});
};
let ip_addr =
IpAddr::from_str(v[1]).unwrap_or(Ipv6Addr::UNSPECIFIED.into());
let Ok(stratum) = u8::from_str(v[2]) else {
return Err(TimeSyncError::FailedToParse {
reason: "bad stratum",
stdout,
});
};
let Ok(ref_time) = f64::from_str(v[3]) else {
return Err(TimeSyncError::FailedToParse {
reason: "bad ref_time",
stdout,
});
};
let Ok(correction) = f64::from_str(v[4]) else {
return Err(TimeSyncError::FailedToParse {
reason: "bad correction",
stdout,
});
};
// Per `chronyc waitsync`'s implementation, if either the
// reference IP address is not unspecified or the reference
// ID is not 0 or 0x7f7f0101, we are synchronized to a peer.
let peer_sync =
!ip_addr.is_unspecified() || (ref_id != 0 && ref_id != 0x7f7f0101);
let sync = stratum < 10
&& ref_time > 1234567890.0
&& peer_sync
&& correction.abs() <= 0.05;
Ok(TimeSync { sync, ref_id, ip_addr, stratum, ref_time, correction })
, but I'll add tests regardless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Thanks for adding the tests though!

Comment on lines +19 to +27
<exec_method type='method' name='start'
exec='/opt/oxide/lib/svc/manifest/ntp-admin.sh'
timeout_seconds='0' />
<exec_method type='method' name='stop' exec=':kill' timeout_seconds='0' />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an entire method_script.sh file for just running a single command (exec /opt/oxide/ntp-admin/bin/ntp-admin "${args[@]}" &)?

How about adding the command directly here?

Suggested change
<exec_method type='method' name='start'
exec='/opt/oxide/lib/svc/manifest/ntp-admin.sh'
timeout_seconds='0' />
<exec_method type='method' name='stop' exec=':kill' timeout_seconds='0' />
<exec_method type='method' name='start'
exec='ctrun -l child -o noorphan,regent /opt/oxide/ntp-admin/bin/ntp-admin run --config-file-path /opt/oxide/lib/svc/ntp-admin/config.toml --address %{config/address} &amp;'
timeout_seconds='0' />
<exec_method type='method' name='stop' exec=':kill' timeout_seconds='0' />

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's cool with you, I'd still rather keep the method script. IMO it's more consistent with other SMF services, easier to read, and easy to modify the moment we need to add additional configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure consistency with other SMF services would be a huge factor here, I took a look and only the clickhouse, cockroach-admin, and cockroachdb services have a separate method_script.sh file. That said, I don't have strong feelings about this, so I'll leave it up to you :)

@smklein smklein force-pushed the ntp-admin branch 2 times, most recently from 5cdb66b to cd0aeb3 Compare July 14, 2025 22:00
Copy link
Contributor

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments, looks good to me!

Comment on lines +19 to +27
<exec_method type='method' name='start'
exec='/opt/oxide/lib/svc/manifest/ntp-admin.sh'
timeout_seconds='0' />
<exec_method type='method' name='stop' exec=':kill' timeout_seconds='0' />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure consistency with other SMF services would be a huge factor here, I took a look and only the clickhouse, cockroach-admin, and cockroachdb services have a separate method_script.sh file. That said, I don't have strong feelings about this, so I'll leave it up to you :)

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

Not as a part of this PR - should we add an internal DNS entry for this server? (I think we discussed wanting that for cockroach-admin too, but I'm not finding an issue for it - maybe it was only in PR comments?)

@smklein
Copy link
Collaborator Author

smklein commented Jul 15, 2025

Thanks, LGTM.

Not as a part of this PR - should we add an internal DNS entry for this server? (I think we discussed wanting that for cockroach-admin too, but I'm not finding an issue for it - maybe it was only in PR comments?)

I think #6796 and #8496 is kinda a related issue too -- these are cases of:

  • We have a zone, with a service, with an IP + Port combo in-use and known to DNS
  • We have a second service in that zone, at a well-known port
  • Is it described by OmicronZoneType + Internal DNS? (in this case, "no, not yet")

My plan was "add the service, use it, ensure that works, then add it to the zone type + DNS for completeness". I kinda wanted to separate "does the service work" from "ensuring we can find this thing in a backwards compatible way".

WDYT? I could try to fix both at once, if we prefer

@jgallagher
Copy link
Contributor

My plan was "add the service, use it, ensure that works, then add it to the zone type + DNS for completeness". I kinda wanted to separate "does the service work" from "ensuring we can find this thing in a backwards compatible way".

100% on board with this plan. It might be right to take "ensuring we can find this thing in a backwards compatible way" on as tech debt intentionally; it's worth fixing but doesn't seem particularly urgent to me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

want NTP HTTP api for access by inventory
4 participants