Skip to content

CP-53477 Update API/CLI to Support Dom0 SSH Control #6385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

LunfanZhang
Copy link
Contributor

@LunfanZhang LunfanZhang commented Mar 24, 2025

This PR introduces support for Dom0 SSH control, providing the following capabilities:

Query the SSH status.
Configure a temporary SSH enable timeout for a specific host or all hosts in the pool.
Configure the console idle timeout for a specific host or all hosts in the pool.

Changes
New Host Object Fields:

  • ssh_enabled: Indicates whether SSH is enabled.
  • ssh_enabled_timeout: Specifies the timeout for temporary SSH enablement.
  • ssh_expiry: Tracks the expiration time for temporary SSH enablement.
  • console_idle_timeout: Configures the idle timeout for the console.

New Host/Pool APIs:

  • set_ssh_enable_timeout: Allows setting a temporary timeout for enabling the SSH service.
  • set_console_timeout: Allows configuring the console idle timeout.

…onfigure

Add new host object fields:
  - ssh_enabled
  - ssh_enabled_timeout
  - ssh_expiry
  - console_idle_timeout
Add new host/pool API to enable to set a temporary enabled SSH service timeout
  - set_ssh_enable_timeout
Add new host/pool API to enable to set console timeout
  - set_console_timeout

Signed-off-by: Lunfan Zhang <Lunfan.Zhang@cloud.com>
@BengangY
Copy link
Contributor

All codes are in only one commit. I would suggest split them into at least 3 commits:

  1. Datamodel define.
  2. Testing code.
  3. API.

@robhoes
Copy link
Member

robhoes commented Mar 24, 2025

note: the PR has involved datamodel change which results the SDK build failed

That needs to be fixed then...

@BengangY BengangY self-requested a review March 24, 2025 10:35
~default_value:(Some (VDateTime Date.epoch)) "ssh_expiry"
"The time when SSH access will expire"
; field ~qualifier:RW ~lifecycle:[] ~ty:Int
~default_value:(Some (VInt 0L)) "console_idle_timeout"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do ssh_enabled_timeout and console_idle_timeout have different unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user perspective, the SSH service timeout should be longer than the idle session timeout. Generally, users might set ssh_enabled_timeout to 30 minutes and console_idle_timeout to 600 seconds, as defined in the REQ. I assume this is reasonable.

@@ -2368,6 +2368,29 @@ let disable_ssh =
~params:[(Ref _host, "self", "The host")]
~allowed_roles:_R_POOL_ADMIN ()

let set_ssh_enable_timeout =
call ~name:"set_ssh_enable_timeout " ~lifecycle:[]
Copy link
Contributor

@BengangY BengangY Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SDK failure may be caused by the blank space after set_ssh_enable_timeout ...

@@ -2368,6 +2368,29 @@ let disable_ssh =
~params:[(Ref _host, "self", "The host")]
~allowed_roles:_R_POOL_ADMIN ()

let set_ssh_enable_timeout =
call ~name:"set_ssh_enable_timeout " ~lifecycle:[]
~doc:"Set the SSH service enabled timeout for the host"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the enable or enabled timeout? There is a contradiciton between the documentation and the name of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_ssh_enabled_timeout

Copy link
Contributor Author

@LunfanZhang LunfanZhang Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original definition is set_ssh_enable_timeout, but it is conflict with the db setter for new fields ssh_enabled_timeout.
How about set_enabled_ssh_timeout or set_ssh_timeout?

(Ref _host, "self", "The host")
; ( Int
, "timeout"
, "The SSH enabled timeout in minutes (0 means no timeout, max 2880)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most time values are seconds in the API. So this is likely to cause errors and there is no hint in the names that this is minutes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I will update to sec

~params:
[
(Ref _host, "self", "The host")
; (Int, "console_timeout", "The idle console timeout in seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case in point: here we use seconds.

@@ -204,6 +204,20 @@ let get_pbds_host rpc session_id pbds =
let get_sr_host rpc session_id record =
get_pbds_host rpc session_id record.API.sR_PBDs

let get_unified_field ~rpc ~session_id ~getter ~transform ~default =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the function? Its semantics seem unusual. Please add documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function iterates over specific fields retrieved from all the hosts in a pool. If all the fields have the same value, it prints that value; otherwise, it prints the default value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is about reporting, I believe the maximum should be reported. If this is about using a value, the minimum should be used. This is a security feature and I can't see the purpose of a default if values don't agree.

@@ -3112,22 +3114,167 @@ let emergency_clear_mandatory_guidance ~__context =
) ;
Db.Host.set_pending_guidances ~__context ~self ~value:[]

let validate_timeout timeout =
if timeout < 0L || timeout > 2880L then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum is 48h. It would be better expressed as 48*60. But I question that such a high timeout is even sensible. As I said before - this is the API and would prefer to keep times to seconds. The XE interface could still accept minutes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with the architecture team and the PM, and we agreed to consider this limitation as acceptable for a weekend day.
and I agree to keep the time in seconds for better accuracy control, aligning with the general practices of XAPI.

let current_time = Date.now () in

let expiry_time =
let current_secs = Date.to_unix_time current_time in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be expressed using time spans. @psafont

Copy link
Contributor Author

@LunfanZhang LunfanZhang Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I do not get the point, could you explain more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid time calculations as the one above and prefer to use Ptime - but @psafont is the expert.

Helpers.call_script "/bin/bash" ["-c"; script] |> Result.ok
| timeout ->
let script =
Printf.sprintf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I like modifying a file by creating a shell script that does this - rather than doing it directly. What does the source /root/.bashrc when executed here achieve?


let set_ssh_enable_timeout ~__context ~self ~timeout =
let uuid = host_uuid ~__context self in
info "Host.set_ssh_enable_timeout : host = '%s'" uuid ;
Copy link
Contributor

@BengangY BengangY Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use __FUNCTION__ to replace Host.set_ssh_enable_timeout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, for these ones in message_forwarding, please leave it as it, as it is meant to show the API call that was made (not the OCaml function). But please add the timeout value.

)
hosts

let set_console_timeout ~__context ~self:_ ~console_timeout =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if one of the hosts fails?

val set_console_timeout :
__context:Context.t -> self:API.ref_host -> console_timeout:int64 -> unit

val schedule_disable_job :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schedule_disable_job is only used inside xapi_host, so it doesn't need to be added to mli.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be exposed, because we will use it in another PR.

Db.Host.set_ssh_enabled ~__context ~self ~value:false ;
debug "Successfully disabled SSH for host %s" host_uuid
with e ->
error "Failed to disable SSH for host %s: %s" host_uuid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the exception be thrown out?

| 0L ->
remove_disable_job ~__context ~self
| t ->
schedule_disable_job ~__context ~self ~timeout:(Int64.mul t 60L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If enable and start are successful, but schedule_disable_job is failed, then the ssh service will always be on. This is not expected. How to handle this scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to handle this because the schedule_disable_job is an asynchronous job.

else
timeout

let remove_disable_job ~__context ~self =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest renaming it to remove_disable_ssh_job.

let task_name = Printf.sprintf "disable_ssh_for_host_%s" host_uuid in
Xapi_stdext_threads_scheduler.Scheduler.remove_from_queue task_name

let schedule_disable_job ~__context ~self ~timeout =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest renaming it to schedule_disable_ssh_job.

Db.Host.set_ssh_enabled_timeout ~__context ~self ~value:timeout
)

let set_console_timeout ~__context ~self ~console_timeout =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used for both "ssh session timeout" and "VNC session timeout"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


let configure_timeout = function
| 0L ->
let script =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the script lines with 0L and timeout be combined into one line with a variable?

@@ -3112,22 +3114,167 @@ let emergency_clear_mandatory_guidance ~__context =
) ;
Db.Host.set_pending_guidances ~__context ~self ~value:[]

let validate_timeout timeout =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it will not be used in other places and it is with a general name, pls make it an inner func of set_ssh_enable_timeout

@@ -1571,6 +1571,33 @@ let disable_ssh =
~params:[(Ref _pool, "self", "The pool")]
~allowed_roles:_R_POOL_ADMIN ()

let set_ssh_enable_timeout =
call ~name:"set_ssh_enable_timeout " ~lifecycle:[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the space in tail.

@@ -204,6 +204,20 @@ let get_pbds_host rpc session_id pbds =
let get_sr_host rpc session_id record =
get_pbds_host rpc session_id record.API.sR_PBDs

let get_unified_field ~rpc ~session_id ~getter ~transform ~default =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to change it to a more specified name, how about: get_common_value_from_all_hosts

if List.for_all (( = ) first_value) all_values then
first_value
else
default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print a log for the detailed differences in the hosts.

@LunfanZhang
Copy link
Contributor Author

LunfanZhang commented Mar 25, 2025

All codes are in only one commit. I would suggest split them into at least 3 commits

Hi @BengangY @lindig @gangj , Thanks for your comments above. to ensure rest (e.g., pool-join related) work is not blocked while waiting for the PR review, I will

  1. split the data model changes into an independent PR and review data model first
  2. convert the above PR into a draft one, before data model PR merged, I will use this PR and fix the comments about new added API
  3. after data model PR is merged, I will close this PR and submit a other PR for new added API and CLI change.

@LunfanZhang LunfanZhang changed the title CP-53477 Update Host/Pool Data model to Support Dom0 SSH Control CP-53477 Update API/CLI to Support Dom0 SSH Control Mar 25, 2025
@LunfanZhang LunfanZhang marked this pull request as draft March 25, 2025 08:26
@robhoes
Copy link
Member

robhoes commented Mar 25, 2025

Close this for now, as this PR cannot be reviewed yet. PR #6388 needs to be reviewed first, and this PR conflicts with it.

@robhoes robhoes closed this Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants