-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
problem
In a KVM cluster with NFS primary storage, VM HA does not work when a host is powered down.
- The host status transitions to Down, HA state shows Fenced.
- VMs from the powered-down host are not restarted on other available hosts in the cluster.
- Both Host HA and VM HA are enabled.
- OOB driver: IPMI.
Expected behavior
VMs from the failed host should be restarted on other available hosts in the cluster.
Actual behavior
- Host goes to
Down
and HA stateFenced
. - VMs are not started elsewhere.
- Management server logs show a
NoTransitionException
.
Relevant log snippet
WARN [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-4:[ctx-c2bf501d]) (logid:96e12771) Unable to find next HA state for current HA state=[Fenced] for event=[Ineligible] for host Host {"id":4,"name":"csh-1-2.clab.run","type":"Routing","uuid":"f8f86177-f0e3-4994-8609-dd55e0e35a3e"} with id 4. com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new state from Fenced via Ineligible
at com.cloud.utils.fsm.StateMachine2.getTransition(StateMachine2.java:108)
at com.cloud.utils.fsm.StateMachine2.getNextState(StateMachine2.java:94)
at org.apache.cloudstack.ha.HAManagerImpl.transitionHAState(HAManagerImpl.java:153)
at org.apache.cloudstack.ha.HAManagerImpl.validateAndFindHAProvider(HAManagerImpl.java:233)
at org.apache.cloudstack.ha.HAManagerImpl$HAManagerBgPollTask.runInContext(HAManagerImpl.java:665)
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
versions
Environment
- CloudStack version: 4.20.1.0
- Hypervisor: KVM
- Primary storage: NFS
- HA settings: Host HA enabled, VM HA enabled, OOB driver = IPMI
The steps to reproduce the bug
1.1. Enable Host HA and VM HA in a KVM cluster (NFS primary storage).
2. Power off a host that runs VMs.
3. Observe host and VM states in the management server.
What to do about it?
No response