-
Notifications
You must be signed in to change notification settings - Fork 45
Create a new NTP-admin service #8555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ee07393
to
d2e15ed
Compare
ntp-admin/src/bin/ntp-admin.rs
Outdated
} | ||
|
||
fn main() { | ||
if let Err(err) = oxide_tokio_rt::run(main_impl()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think this service really needs 128 worker threads, or should we maybe consider building a single-threaded runtime here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not, but I was just copying the structure of the cockroach-admin server (which is also apparently launching 128 threads -- I doubt it needs that many)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use 8
ntp-admin/src/http_entrypoints.rs
Outdated
} | ||
|
||
let Ok(ref_id) = u32::from_str_radix(v[0], 16) else { | ||
error!(log, "failed to parse ref_id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be worthwhile to include the invalid value here (and subsequently)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! There are a few chronyc-specific bits that I don't really know about, so I just took a look at the other parts of this PR :)
ntp-admin/src/http_entrypoints.rs
Outdated
if v.len() < 10 { | ||
error!(log, "too few fields"; "stdout" => &stdout); | ||
return Err(TimeSyncError::FailedToParse { | ||
reason: "too few fields", | ||
stdout, | ||
}); | ||
} | ||
|
||
let Ok(ref_id) = u32::from_str_radix(v[0], 16) else { | ||
error!(log, "failed to parse ref_id"; "stdout" => &stdout); | ||
return Err(TimeSyncError::FailedToParse { | ||
reason: "bad ref_id", | ||
stdout, | ||
}); | ||
}; | ||
let ip_addr = | ||
IpAddr::from_str(v[1]).unwrap_or(Ipv6Addr::UNSPECIFIED.into()); | ||
let Ok(stratum) = u8::from_str(v[2]) else { | ||
error!(log, "bad stratum"; "stdout" => &stdout); | ||
return Err(TimeSyncError::FailedToParse { | ||
reason: "bad stratum", | ||
stdout, | ||
}); | ||
}; | ||
let Ok(ref_time) = f64::from_str(v[3]) else { | ||
error!(log, "bad ref_time"; "stdout" => &stdout); | ||
return Err(TimeSyncError::FailedToParse { | ||
reason: "bad ref_time", | ||
stdout, | ||
}); | ||
}; | ||
let Ok(correction) = f64::from_str(v[4]) else { | ||
error!(log, "bad correction"; "stdout" => &stdout); | ||
return Err(TimeSyncError::FailedToParse { | ||
reason: "bad correction", | ||
stdout, | ||
}); | ||
}; | ||
|
||
// Per `chronyc waitsync`'s implementation, if either the | ||
// reference IP address is not unspecified or the reference | ||
// ID is not 0 or 0x7f7f0101, we are synchronized to a peer. | ||
let peer_sync = | ||
!ip_addr.is_unspecified() || (ref_id != 0 && ref_id != 0x7f7f0101); | ||
|
||
let sync = stratum < 10 | ||
&& ref_time > 1234567890.0 | ||
&& peer_sync | ||
&& correction.abs() <= 0.05; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be good to get some tests for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this is largely being lifted from
omicron/sled-agent/config-reconciler/src/reconciler_task/zones.rs
Lines 503 to 560 in 248e7ed
// XXXNTP - This could be replaced with a direct connection to the | |
// daemon using a patched version of the chrony_candm crate to allow | |
// a custom server socket path. From the GZ, it should be possible to | |
// connect to the UNIX socket at | |
// format!("{}/var/run/chrony/chronyd.sock", ntp_zone.root()) | |
let stdout = running_ntp_zone | |
.run_cmd(&["/usr/bin/chronyc", "-c", "tracking"]) | |
.map_err(TimeSyncError::ExecuteChronyc)?; | |
let v: Vec<&str> = stdout.split(',').collect(); | |
if v.len() < 10 { | |
return Err(TimeSyncError::FailedToParse { | |
reason: "too few fields", | |
stdout, | |
}); | |
} | |
let Ok(ref_id) = u32::from_str_radix(v[0], 16) else { | |
return Err(TimeSyncError::FailedToParse { | |
reason: "bad ref_id", | |
stdout, | |
}); | |
}; | |
let ip_addr = | |
IpAddr::from_str(v[1]).unwrap_or(Ipv6Addr::UNSPECIFIED.into()); | |
let Ok(stratum) = u8::from_str(v[2]) else { | |
return Err(TimeSyncError::FailedToParse { | |
reason: "bad stratum", | |
stdout, | |
}); | |
}; | |
let Ok(ref_time) = f64::from_str(v[3]) else { | |
return Err(TimeSyncError::FailedToParse { | |
reason: "bad ref_time", | |
stdout, | |
}); | |
}; | |
let Ok(correction) = f64::from_str(v[4]) else { | |
return Err(TimeSyncError::FailedToParse { | |
reason: "bad correction", | |
stdout, | |
}); | |
}; | |
// Per `chronyc waitsync`'s implementation, if either the | |
// reference IP address is not unspecified or the reference | |
// ID is not 0 or 0x7f7f0101, we are synchronized to a peer. | |
let peer_sync = | |
!ip_addr.is_unspecified() || (ref_id != 0 && ref_id != 0x7f7f0101); | |
let sync = stratum < 10 | |
&& ref_time > 1234567890.0 | |
&& peer_sync | |
&& correction.abs() <= 0.05; | |
Ok(TimeSync { sync, ref_id, ip_addr, stratum, ref_time, correction }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Thanks for adding the tests though!
<exec_method type='method' name='start' | ||
exec='/opt/oxide/lib/svc/manifest/ntp-admin.sh' | ||
timeout_seconds='0' /> | ||
<exec_method type='method' name='stop' exec=':kill' timeout_seconds='0' /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an entire method_script.sh
file for just running a single command (exec /opt/oxide/ntp-admin/bin/ntp-admin "${args[@]}" &
)?
How about adding the command directly here?
<exec_method type='method' name='start' | |
exec='/opt/oxide/lib/svc/manifest/ntp-admin.sh' | |
timeout_seconds='0' /> | |
<exec_method type='method' name='stop' exec=':kill' timeout_seconds='0' /> | |
<exec_method type='method' name='start' | |
exec='ctrun -l child -o noorphan,regent /opt/oxide/ntp-admin/bin/ntp-admin run --config-file-path /opt/oxide/lib/svc/ntp-admin/config.toml --address %{config/address} &' | |
timeout_seconds='0' /> | |
<exec_method type='method' name='stop' exec=':kill' timeout_seconds='0' /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's cool with you, I'd still rather keep the method script. IMO it's more consistent with other SMF services, easier to read, and easy to modify the moment we need to add additional configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure consistency with other SMF services would be a huge factor here, I took a look and only the clickhouse
, cockroach-admin
, and cockroachdb
services have a separate method_script.sh file. That said, I don't have strong feelings about this, so I'll leave it up to you :)
5cdb66b
to
cd0aeb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments, looks good to me!
<exec_method type='method' name='start' | ||
exec='/opt/oxide/lib/svc/manifest/ntp-admin.sh' | ||
timeout_seconds='0' /> | ||
<exec_method type='method' name='stop' exec=':kill' timeout_seconds='0' /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure consistency with other SMF services would be a huge factor here, I took a look and only the clickhouse
, cockroach-admin
, and cockroachdb
services have a separate method_script.sh file. That said, I don't have strong feelings about this, so I'll leave it up to you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM.
Not as a part of this PR - should we add an internal DNS entry for this server? (I think we discussed wanting that for cockroach-admin too, but I'm not finding an issue for it - maybe it was only in PR comments?)
I think #6796 and #8496 is kinda a related issue too -- these are cases of:
My plan was "add the service, use it, ensure that works, then add it to the zone type + DNS for completeness". I kinda wanted to separate "does the service work" from "ensuring we can find this thing in a backwards compatible way". WDYT? I could try to fix both at once, if we prefer |
100% on board with this plan. It might be right to take "ensuring we can find this thing in a backwards compatible way" on as tech debt intentionally; it's worth fixing but doesn't seem particularly urgent to me? |
This exposes the same access to chrony that exists in
config-reconciler/src/reconciler_task/zones.rs
, but over an HTTP interface instead.Fixes #8552