-
Notifications
You must be signed in to change notification settings - Fork 292
CP-48625, CP-48752: VM anti-affinity support for host evacuation and related UT #5599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-48625, CP-48752: VM anti-affinity support for host evacuation and related UT #5599
Conversation
bf8cf27
to
e35169d
Compare
ea37fca
to
c9744d0
Compare
This comment was marked as resolved.
This comment was marked as resolved.
c9744d0
to
6be3862
Compare
This comment was marked as resolved.
This comment was marked as resolved.
b8d483d
to
daa9db6
Compare
147fe8f
to
dc98eb8
Compare
4496714
to
69d3a78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of the goals of this change is to keep the time complexity as
O ( M * log H)
, where the M
is the number of VMs to be migrated and the H
is the number of hosts.
E.g. the following data structures are used for this purpose in planning for spreading evenly:
- A map (host -> free memory):
int64 HostMap.t
- A two-layer map (vm_group -> host -> vm_cnt):
int HostMap.t VMGroupMap.t
- A map (vm_group -> set of
(vm_cnt, free_mem, h)
):HostsSet.t VMGroupMap.t
For each VM placement schedule on one host
:
The data 1 is updated in O(log H)
The data 2 is updated in O(G * log H)
where the G
is the number of VM groups
The data 3 is updated in O(log H)
, as the set
can keep the order.
Currently, we assume the G
is a constant number. So overall it is still O(M * log H)
.
==== update ====
Based on above, the psq (suggested by Edwin in another communicate channel) looks the right data structure. Alternatively, a home-brewed module would make the code more compact as well. E.g. (just some snippets)
module type S = sig
type t
type k
type v
val add -> k -> p -> t -> t
val update -> k -> (v option -> v option) -> t -> t
val pop -> t -> (k * v option)
end
module SearchQueue (K) (V) :
S with type k = K.t and type v = V.t =
struct
type k = K.t
type v = V.t
type t = {m: Map.make (k); s: Set.make (v * k)}
let update k f t =
match HostMap.find_opt k t.m with
| Some v ->
let s = HostSet.remove (v, k) t.s in
let v' = f v in
let m' = HostMap.update k (fun _ -> v') t.m in
let s' = HostSet.add (v', k) t.s in
{t with m=m'; s= s'}
| None ->
t
end
module HostKey = struct
type t = string
let compare = compare
end
module HostPlanState = struct
type t = int * int
let compare = ...
end
let plan_state = SearchQueue.make (HostKey) (HostPlanState) in
let update_all_plan_states h vm_size all_states =
let update' =
Option.map (fun (cnt, free) -> (cnt + 1, free - vm_size))
in
let update'' =
Option.map (fun (cnt, free) -> (cnt, free - vm_size))
in
VMGroup.mapi (fun group sq ->
match group = group' with
| true ->
Sq.update h update' sq
| false ->
Sq.update h update'' sq
) all_states
Not necessary to use data 1 and data 2 above. Instead, for each VM group, they are wrapped into SearchQueue
.
Thanks for the suggestion, discussion and review! |
ae75f80
to
1827620
Compare
ocaml/xapi/xapi_ha_vm_failover.ml
Outdated
hosts | ||
|> HostsSet.remove (vm_cnt, host_size, host) | ||
|> | ||
match grp = group with | ||
| true -> | ||
HostsSet.add (vm_cnt + 1, updated_host_size, host) | ||
| false -> | ||
HostsSet.add (vm_cnt, updated_host_size, host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to figure out that below is a function:
match grp = group with
| true ->
HostsSet.add (vm_cnt + 1, updated_host_size, host)
| false ->
HostsSet.add (vm_cnt, updated_host_size, host)
I think we can refactor it to better for explicit:
let hosts = HostsSet.remove (vm_cnt, host_size, host) hosts in
match grp = group with
| true ->
HostsSet.add (vm_cnt + 1, updated_host_size, host) hosts
| false ->
HostsSet.add (vm_cnt, updated_host_size, host) hosts
ocaml/xapi/xapi_ha_vm_failover.ml
Outdated
|
||
let update_host_size_map host vm_size host_size_map = | ||
Xapi_vm_helpers.HostMap.update host | ||
(fun host_size -> Int64.sub (host_size |> Option.get) vm_size |> Option.some) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can host_size be None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the Option.get
is to make sure it should always exist in the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be None
if host
is not in the host_size_map
, and then Option.get
will raise and exception. I think we are relying on init_host_size_map
to have added the host here? It would be better to not risk raising exceptions here, as they won't be caught.
3e05c05
to
925bdda
Compare
Changed to the Psq version, please help to review again, thanks! |
ocaml/xapi/xapi_ha_vm_failover.ml
Outdated
|
||
let update_host_size_map host vm_size host_size_map = | ||
Xapi_vm_helpers.HostMap.update host | ||
(fun host_size -> Int64.sub (host_size |> Option.get) vm_size |> Option.some) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be None
if host
is not in the host_size_map
, and then Option.get
will raise and exception. I think we are relying on init_host_size_map
to have added the host here? It would be better to not risk raising exceptions here, as they won't be caught.
ocaml/xapi/xapi_ha_vm_failover.ml
Outdated
(Option.value ~default:Xapi_vm_helpers.HostMap.empty host_vm_cnt_map) | ||
) | ||
) | ||
grp_host_vm_cnt_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Option.get
will raise an exception if the host_vm_cnt_map = None
, i.e. the group is not in the map.
2fb7a48
to
4f922ea
Compare
ocaml/xapi/xapi_ha_vm_failover.ml
Outdated
| Some (host, (_vm_cnt, h_size)) -> ( | ||
match vm_size <= h_size with | ||
| true -> | ||
Some host | ||
| _ -> | ||
select_host_for_spread_evenly_plan vm vm_size | ||
(HostsSet.remove (vm_cnt, host_size, host) hosts_set) | ||
hosts_queue | ||
|> AntiAffinityEvacuationPlanHostQueue.remove host | ||
|> select_host_for_spread_evenly_plan vm vm_size | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using guard here:
| Some (host, (_vm_cnt, h_size)) -> ( | |
match vm_size <= h_size with | |
| true -> | |
Some host | |
| _ -> | |
select_host_for_spread_evenly_plan vm vm_size | |
(HostsSet.remove (vm_cnt, host_size, host) hosts_set) | |
hosts_queue | |
|> AntiAffinityEvacuationPlanHostQueue.remove host | |
|> select_host_for_spread_evenly_plan vm vm_size | |
) | |
| Some (host, (_vm_cnt, h_size)) when vm_size <= h_size -> | |
Some host | |
| Some (host, _) -> | |
hosts_queue | |
|> AntiAffinityEvacuationPlanHostQueue.remove host | |
|> select_host_for_spread_evenly_plan vm vm_size |
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Host evacuation plan with anti-affinity support will be carried out in 3 steps: 1. Try to get a "spread evenly" plan for anti-affinity VMs, done if all the rest VMs got planned using binpack, otherwise continue. 2. Try to get a "no breach" plan for anti-affinity VMs, done if all the rest VMs got planned using binpack, otherwise continue. 3. Carry out a binpack plan ignoring VM anti-affinity. Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
Add "keep_localhost" to enable test pool with more than 3 hosts. Signed-off-by: Gang Ji <gang.ji@citrix.com>
Signed-off-by: Gang Ji <gang.ji@citrix.com>
4f922ea
to
96348f1
Compare
(*****************************************************************************************************) | ||
(* Planning code follows *) | ||
|
||
let rec select_host_for_spread_evenly_plan vm vm_size hosts_queue = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge the one and below select_host_for_no_breach_plan
2. For each anti-affinity VM, select a host which can run it, and which has the minimal number of | ||
VMs in the same anti-affinity group running on it, for the hosts with the same number of running | ||
VMs in that group, pick the one with the minimal free memory. *) | ||
let rec compute_spread_evenly_plan ~__context pool_state anti_affinity_vms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to List.fold_left
2. For each anti-affinity VM, try to select a host for it so that there are at least 2 hosts which | ||
has running VMs in the same anti-affinity group. If there are already 2 hosts having running VMs | ||
in that group, skip planning for the VM. *) | ||
let rec compute_no_breach_plan ~__context total_hosts pool_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to List.fold_left
2. For each anti-affinity VM, try to select a host for it so that there are at least 2 hosts which | ||
has running VMs in the same anti-affinity group. If there are already 2 hosts having running VMs | ||
in that group, skip planning for the VM. *) | ||
let rec compute_no_breach_plan ~__context total_hosts pool_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check total_hosts before the plan
debug "Plan OK" ; plan | ||
) | ||
in | ||
plan_in_steps ~__context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to monad
(Option.map (fun (_vm_cnt, h_size) -> | ||
(_vm_cnt, Int64.sub h_size vm_size) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_vm_cnt
should be vm_cnt
.
2. Try to get a "no breach" plan for anti-affinity VMs, and then a binpack plan for the | ||
rest of VMs. Done if every VM got planned, otherwise continue. | ||
3. Carry out a binpack plan ignoring VM anti-affinity. *) | ||
let compute_anti_affinity_evacuation_plan ~__context total_hosts hosts vms = | ||
let config = | ||
{Binpack.hosts; vms; placement= []; total_hosts; num_failures= 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is num_failures
1 , or is it just from 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is used by binpack for HA plan only, not used for "host evacuation".
Continues here: #5652 |
Host evacuation plan with anti-affinity support will be carried out in 3 steps:
all the rest VMs got planned using binpack, otherwise continue.
got planned using binpack, otherwise continue.