Skip to content

KVM cluster with NFS primary storage – VM HA not working when host is powered down #11627

@akoskuczi-bw

Description

@akoskuczi-bw

problem

In a KVM cluster with NFS primary storage, VM HA does not work when a host is powered down.

  • The host status transitions to Down, HA state shows Fenced.
  • VMs from the powered-down host are not restarted on other available hosts in the cluster.
  • Both Host HA and VM HA are enabled.
  • OOB driver: IPMI.

Expected behavior

VMs from the failed host should be restarted on other available hosts in the cluster.

Actual behavior

  • Host goes to Down and HA state Fenced.
  • VMs are not started elsewhere.
  • Management server logs show a NoTransitionException.

Relevant log snippet

WARN [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-4:[ctx-c2bf501d]) (logid:96e12771) Unable to find next HA state for current HA state=[Fenced] for event=[Ineligible] for host Host {"id":4,"name":"csh-1-2.clab.run","type":"Routing","uuid":"f8f86177-f0e3-4994-8609-dd55e0e35a3e"} with id 4. com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new state from Fenced via Ineligible
at com.cloud.utils.fsm.StateMachine2.getTransition(StateMachine2.java:108)
at com.cloud.utils.fsm.StateMachine2.getNextState(StateMachine2.java:94)
at org.apache.cloudstack.ha.HAManagerImpl.transitionHAState(HAManagerImpl.java:153)
at org.apache.cloudstack.ha.HAManagerImpl.validateAndFindHAProvider(HAManagerImpl.java:233)
at org.apache.cloudstack.ha.HAManagerImpl$HAManagerBgPollTask.runInContext(HAManagerImpl.java:665)
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)

versions

Environment

  • CloudStack version: 4.20.1.0
  • Hypervisor: KVM
  • Primary storage: NFS
  • HA settings: Host HA enabled, VM HA enabled, OOB driver = IPMI

The steps to reproduce the bug

1.1. Enable Host HA and VM HA in a KVM cluster (NFS primary storage).
2. Power off a host that runs VMs.
3. Observe host and VM states in the management server.

What to do about it?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions