Skip to content

NDM looping constantly causing high cpu usage with Error: unreachable state #674

@magnetised

Description

@magnetised

What steps did you take and what happened:
I've just installed openebs as part of k0s on an aws ec2 instance with 2 disks, the host disk and a separate ebs data partition. Everything seems to be working fine but one of the ndm pods is at a constant 20% cpu usage. looking at the logs it seems to be in some loop querying the host/node disks

Looking at another server with the same ndm version but a simpler, single-disk setup, the exact same thing is happening.

What did you expect to happen:
I expected the ndm process to not be constantly using cpu in a constant loop.

The output of the following commands will help us better understand what's going on:
[Pasting long output into a GitHub gist or other pastebin is fine.]

  • kubectl get pods -n openebs
NAME                                           READY   STATUS    RESTARTS      AGE
openebs-localpv-provisioner-6ccc9d6fc9-kcnhs   1/1     Running   9 (19h ago)   20h
openebs-ndm-jpvpw                              1/1     Running   0             26m
openebs-ndm-operator-7bd6898d96-vz54r          1/1     Running   9 (19h ago)   20h

  • kubectl get blockdevices -n openebs -o yaml
apiVersion: v1
items:
- apiVersion: openebs.io/v1alpha1
  kind: BlockDevice
  metadata:
    annotations:
      internal.openebs.io/uuid-scheme: gpt
    creationTimestamp: "2022-07-05T13:22:38Z"
    generation: 20
    labels:
      kubernetes.io/hostname: ip-172-31-18-163.eu-west-1.compute.internal
      ndm.io/blockdevice-type: blockdevice
      ndm.io/managed: "true"
    name: blockdevice-01fd0d0d966998648102985c5f12e22a
    namespace: openebs
    resourceVersion: "64236"
    uid: 9d3e2ec3-57b5-4303-829c-e0cfa51f2f07
  spec:
    capacity:
      logicalSectorSize: 512
      physicalSectorSize: 512
      storage: 137437888000
    details:
      compliance: ""
      deviceType: partition
      driveType: SSD
      firmwareRevision: ""
      hardwareSectorSize: 512
      logicalBlockSize: 512
      model: Amazon Elastic Block Store
      physicalBlockSize: 512
      serial: vol033aa51d4508ed1b0
      vendor: ""
    devlinks:
    - kind: by-id
      links:
      - /dev/disk/by-id/nvme-nvme.1d0f-766f6c3033336161353164343530386564316230-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1
      - /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol033aa51d4508ed1b0-part1
      - /dev/disk/by-id/wwn-nvme.1d0f-766f6c3033336161353164343530386564316230-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1
    - kind: by-path
      links:
      - /dev/disk/by-path/pci-0000:00:1f.0-nvme-1-part1
    filesystem:
      fsType: xfs
      mountPoint: /var/openebs
    nodeAttributes:
      nodeName: ip-172-31-18-163.eu-west-1.compute.internal
    partitioned: "No"
    path: /dev/nvme1n1p1
  status:
    claimState: Unclaimed
    state: Inactive
kind: List
metadata:
  resourceVersion: ""
  • kubectl get blockdeviceclaims -n openebs -o yaml
apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""

  • kubectl logs <ndm daemon pod name> -n openebs

just including two loops, it goes on like this permanently.

https://gist.github.com/magnetised/c1f2bef4242b663721d87898f8416d65

  • lsblk from nodes where ndm daemonset is running
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme1n1     259:0    0  128G  0 disk
└─nvme1n1p1 259:4    0  128G  0 part /var/openebs
nvme0n1     259:1    0  128G  0 disk
├─nvme0n1p1 259:2    0    1M  0 part
└─nvme0n1p2 259:3    0  128G  0 part /

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • OpenEBS version

openebs.io/version=3.0.0
node-disk-manager:1.7.0

  • Kubernetes version (use kubectl version):
Client Version: v1.24.2
Kustomize Version: v4.5.4
Server Version: v1.23.6+k0s
  • Kubernetes installer & version:

K0s version v1.23.6+k0s.0

  • Cloud provider or hardware configuration:

AWS EC2 instance

  • Type of disks connected to the nodes (eg: Virtual Disks, GCE/EBS Volumes, Physical drives etc)

host root partition nvme0n1
open ebs volume nvme1n1 with a single partition nvme1n1p1 mounted at /var/openebs

  • OS (e.g. from /etc/os-release):
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions