Skip to content

CP-48625, CP-48752: VM anti-affinity support for host evacuation and related UT #5599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gangj
Copy link
Contributor

@gangj gangj commented Apr 29, 2024

Host evacuation plan with anti-affinity support will be carried out in 3 steps:

  1. Try to get a "spread evenly" plan for anti-affinity VMs, done if
    all the rest VMs got planned using binpack, otherwise continue.
  2. Try to get a "no breach" plan for anti-affinity VMs, done if all the rest VMs
    got planned using binpack, otherwise continue.
  3. Carry out a binpack plan ignoring VM anti-affinity.

@gangj gangj force-pushed the private/gangj/CP-48625 branch 3 times, most recently from bf8cf27 to e35169d Compare April 29, 2024 09:45
@gangj gangj force-pushed the private/gangj/CP-48625 branch from ea37fca to c9744d0 Compare May 1, 2024 14:32
@gangj

This comment was marked as resolved.

@gangj gangj force-pushed the private/gangj/CP-48625 branch from c9744d0 to 6be3862 Compare May 1, 2024 14:58
@gangj

This comment was marked as resolved.

@gangj gangj force-pushed the private/gangj/CP-48625 branch from b8d483d to daa9db6 Compare May 7, 2024 11:20
@gangj gangj force-pushed the private/gangj/CP-48625 branch from 147fe8f to dc98eb8 Compare May 7, 2024 15:56
@gangj gangj force-pushed the private/gangj/CP-48625 branch from 4496714 to 69d3a78 Compare May 8, 2024 09:39
Copy link
Member

@minglumlu minglumlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the goals of this change is to keep the time complexity as
O ( M * log H), where the M is the number of VMs to be migrated and the H is the number of hosts.
E.g. the following data structures are used for this purpose in planning for spreading evenly:

  1. A map (host -> free memory): int64 HostMap.t
  2. A two-layer map (vm_group -> host -> vm_cnt): int HostMap.t VMGroupMap.t
  3. A map (vm_group -> set of (vm_cnt, free_mem, h)): HostsSet.t VMGroupMap.t

For each VM placement schedule on one host:
The data 1 is updated in O(log H)
The data 2 is updated in O(G * log H) where the G is the number of VM groups
The data 3 is updated in O(log H), as the set can keep the order.

Currently, we assume the G is a constant number. So overall it is still O(M * log H).

==== update ====
Based on above, the psq (suggested by Edwin in another communicate channel) looks the right data structure. Alternatively, a home-brewed module would make the code more compact as well. E.g. (just some snippets)

module type S = sig
  type t
  type k
  type v

  val add -> k -> p -> t -> t
  val update -> k -> (v option -> v option) -> t -> t
  val pop -> t -> (k * v option)
end

module SearchQueue (K) (V) :
  S with type k = K.t and type v = V.t =
struct
  type k = K.t
  type v = V.t

  type t = {m: Map.make (k); s: Set.make (v * k)}

  let update k f t =
    match HostMap.find_opt k t.m with
    | Some v ->
        let s = HostSet.remove (v, k) t.s in
        let v' = f v in
        let m' = HostMap.update k (fun _ -> v') t.m in
        let s' = HostSet.add (v', k) t.s in
        {t with m=m'; s= s'}
    | None ->
        t
end

module HostKey = struct
  type t = string
  let compare = compare
end

module HostPlanState = struct
  type t = int * int
  let compare = ...
end

let plan_state = SearchQueue.make (HostKey) (HostPlanState) in

let update_all_plan_states h vm_size all_states =
  let update' =
    Option.map (fun (cnt, free) -> (cnt + 1, free - vm_size))
  in
  let update'' =
    Option.map (fun (cnt, free) -> (cnt, free - vm_size))
  in
  VMGroup.mapi (fun group sq ->
    match group = group' with
    | true ->
        Sq.update h update' sq
    | false ->
        Sq.update h update'' sq
  ) all_states

Not necessary to use data 1 and data 2 above. Instead, for each VM group, they are wrapped into SearchQueue.

@gangj
Copy link
Contributor Author

gangj commented May 17, 2024

The data 2 is updated in O(G * log H) where the G is the number of VM groups

I think one of the goals of this change is to keep the time complexity as O ( M * log H), where the M is the number of VMs to be migrated and the H is the number of hosts. E.g. the following data structures are used for this purpose in planning for spreading evenly:

  1. A map (host -> free memory): int64 HostMap.t
  2. A two-layer map (vm_group -> host -> vm_cnt): int HostMap.t VMGroupMap.t
  3. A map (vm_group -> set of (vm_cnt, free_mem, h)): HostsSet.t VMGroupMap.t

For each VM placement schedule on one host: The data 1 is updated in O(log H) The data 2 is updated in O(G * log H) where the G is the number of VM groups The data 3 is updated in O(log H), as the set can keep the order.

Currently, we assume the G is a constant number. So overall it is still O(M * log H).

Thanks for the suggestion, discussion and review!
One small update:
The data 2 is updated in O(log G * log H)
The data 3 is also updated in O(log G * log H)
as both of them are inside a VMGroupMap.

@gangj gangj force-pushed the private/gangj/CP-48625 branch 3 times, most recently from ae75f80 to 1827620 Compare May 17, 2024 07:41
Comment on lines 300 to 340
hosts
|> HostsSet.remove (vm_cnt, host_size, host)
|>
match grp = group with
| true ->
HostsSet.add (vm_cnt + 1, updated_host_size, host)
| false ->
HostsSet.add (vm_cnt, updated_host_size, host)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to figure out that below is a function:

  match grp = group with
      | true ->
          HostsSet.add (vm_cnt + 1, updated_host_size, host)
      | false ->
          HostsSet.add (vm_cnt, updated_host_size, host)

I think we can refactor it to better for explicit:

      let hosts = HostsSet.remove (vm_cnt, host_size, host) hosts in
      match grp = group with
      | true ->
          HostsSet.add (vm_cnt + 1, updated_host_size, host) hosts
      | false ->
          HostsSet.add (vm_cnt, updated_host_size, host) hosts


let update_host_size_map host vm_size host_size_map =
Xapi_vm_helpers.HostMap.update host
(fun host_size -> Int64.sub (host_size |> Option.get) vm_size |> Option.some)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can host_size be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the Option.get is to make sure it should always exist in the map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be None if host is not in the host_size_map, and then Option.get will raise and exception. I think we are relying on init_host_size_map to have added the host here? It would be better to not risk raising exceptions here, as they won't be caught.

@gangj gangj force-pushed the private/gangj/CP-48625 branch from 3e05c05 to 925bdda Compare May 21, 2024 07:19
@gangj
Copy link
Contributor Author

gangj commented May 21, 2024

==== update ==== Based on above, the psq (suggested by Edwin in another communicate channel) looks the right data structure. Alternatively, a home-brewed module would make the code more compact as well. E.g. (just some snippets)

module type S = sig
  type t
  type k
  type v

  val add -> k -> p -> t -> t
  val update -> k -> (v option -> v option) -> t -> t
  val pop -> t -> (k * v option)
end

module SearchQueue (K) (V) :
  S with type k = K.t and type v = V.t =
struct
  type k = K.t
  type v = V.t

  type t = {m: Map.make (k); s: Set.make (v * k)}

  let update k f t =
    match HostMap.find_opt k t.m with
    | Some v ->
        let s = HostSet.remove (v, k) t.s in
        let v' = f v in
        let m' = HostMap.update k (fun _ -> v') t.m in
        let s' = HostSet.add (v', k) t.s in
        {t with m=m'; s= s'}
    | None ->
        t
end

module HostKey = struct
  type t = string
  let compare = compare
end

module HostPlanState = struct
  type t = int * int
  let compare = ...
end

let plan_state = SearchQueue.make (HostKey) (HostPlanState) in

let update_all_plan_states h vm_size all_states =
  let update' =
    Option.map (fun (cnt, free) -> (cnt + 1, free - vm_size))
  in
  let update'' =
    Option.map (fun (cnt, free) -> (cnt, free - vm_size))
  in
  VMGroup.mapi (fun group sq ->
    match group = group' with
    | true ->
        Sq.update h update' sq
    | false ->
        Sq.update h update'' sq
  ) all_states

Not necessary to use data 1 and data 2 above. Instead, for each VM group, they are wrapped into SearchQueue.

Changed to the Psq version, please help to review again, thanks!


let update_host_size_map host vm_size host_size_map =
Xapi_vm_helpers.HostMap.update host
(fun host_size -> Int64.sub (host_size |> Option.get) vm_size |> Option.some)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be None if host is not in the host_size_map, and then Option.get will raise and exception. I think we are relying on init_host_size_map to have added the host here? It would be better to not risk raising exceptions here, as they won't be caught.

(Option.value ~default:Xapi_vm_helpers.HostMap.empty host_vm_cnt_map)
)
)
grp_host_vm_cnt_map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Option.get will raise an exception if the host_vm_cnt_map = None, i.e. the group is not in the map.

@gangj gangj force-pushed the private/gangj/CP-48625 branch 2 times, most recently from 2fb7a48 to 4f922ea Compare May 23, 2024 09:49
Comment on lines 381 to 389
| Some (host, (_vm_cnt, h_size)) -> (
match vm_size <= h_size with
| true ->
Some host
| _ ->
select_host_for_spread_evenly_plan vm vm_size
(HostsSet.remove (vm_cnt, host_size, host) hosts_set)
hosts_queue
|> AntiAffinityEvacuationPlanHostQueue.remove host
|> select_host_for_spread_evenly_plan vm vm_size
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using guard here:

Suggested change
| Some (host, (_vm_cnt, h_size)) -> (
match vm_size <= h_size with
| true ->
Some host
| _ ->
select_host_for_spread_evenly_plan vm vm_size
(HostsSet.remove (vm_cnt, host_size, host) hosts_set)
hosts_queue
|> AntiAffinityEvacuationPlanHostQueue.remove host
|> select_host_for_spread_evenly_plan vm vm_size
)
| Some (host, (_vm_cnt, h_size)) when vm_size <= h_size ->
Some host
| Some (host, _) ->
hosts_queue
|> AntiAffinityEvacuationPlanHostQueue.remove host
|> select_host_for_spread_evenly_plan vm vm_size

Gang Ji added 7 commits May 23, 2024 17:59
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Host evacuation plan with anti-affinity support will be carried out in 3
steps:

1. Try to get a "spread evenly" plan for anti-affinity VMs, done if
   all the rest VMs got planned using binpack, otherwise continue.
2. Try to get a "no breach" plan for anti-affinity VMs, done if all the
   rest VMs got planned using binpack, otherwise continue.
3. Carry out a binpack plan ignoring VM anti-affinity.

Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Add "keep_localhost" to enable test pool with more than 3 hosts.

Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
@gangj gangj force-pushed the private/gangj/CP-48625 branch from 4f922ea to 96348f1 Compare May 23, 2024 10:00
(*****************************************************************************************************)
(* Planning code follows *)

let rec select_host_for_spread_evenly_plan vm vm_size hosts_queue =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge the one and below select_host_for_no_breach_plan

2. For each anti-affinity VM, select a host which can run it, and which has the minimal number of
VMs in the same anti-affinity group running on it, for the hosts with the same number of running
VMs in that group, pick the one with the minimal free memory. *)
let rec compute_spread_evenly_plan ~__context pool_state anti_affinity_vms
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to List.fold_left

2. For each anti-affinity VM, try to select a host for it so that there are at least 2 hosts which
has running VMs in the same anti-affinity group. If there are already 2 hosts having running VMs
in that group, skip planning for the VM. *)
let rec compute_no_breach_plan ~__context total_hosts pool_state
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to List.fold_left

2. For each anti-affinity VM, try to select a host for it so that there are at least 2 hosts which
has running VMs in the same anti-affinity group. If there are already 2 hosts having running VMs
in that group, skip planning for the VM. *)
let rec compute_no_breach_plan ~__context total_hosts pool_state
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check total_hosts before the plan

debug "Plan OK" ; plan
)
in
plan_in_steps ~__context
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to monad

Comment on lines +351 to +353
(Option.map (fun (_vm_cnt, h_size) ->
(_vm_cnt, Int64.sub h_size vm_size)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_vm_cnt should be vm_cnt.

2. Try to get a "no breach" plan for anti-affinity VMs, and then a binpack plan for the
rest of VMs. Done if every VM got planned, otherwise continue.
3. Carry out a binpack plan ignoring VM anti-affinity. *)
let compute_anti_affinity_evacuation_plan ~__context total_hosts hosts vms =
let config =
{Binpack.hosts; vms; placement= []; total_hosts; num_failures= 1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is num_failures 1 , or is it just from 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is used by binpack for HA plan only, not used for "host evacuation".

@gangj
Copy link
Contributor Author

gangj commented May 24, 2024

Continues here: #5652

@gangj gangj closed this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants