Skip to content

100% CPU usage when used in kubectl exec and connection is terminated #1717

Open
@plevart

Description

@plevart

In my 3-node k8s cluster using:

root@nk8s1:~# /usr/libexec/crio/crun --version
crun version 1.20
commit: 9c9a76ac11994701dd666c4f0b869ceffb599a66
rundir: /run/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL

root@nk8s1:~# crio --version
crio version 1.32.2
   GitCommit:      318db72eb0b3d18c22c995aa7614a13142287296
   GitCommitDate:  2025-03-02T18:05:31Z
   GitTreeState:   dirty
   BuildDate:      1970-01-01T00:00:00Z
   GoVersion:      go1.23.3
   Compiler:       gc
   Platform:       linux/amd64
   Linkmode:       static
   BuildTags:
     static
     netgo
     osusergo
     exclude_graphdriver_btrfs
     seccomp
     apparmor
     selinux
     exclude_graphdriver_devicemapper
   LDFlags:          unknown
   SeccompEnabled:   true
   AppArmorEnabled:  false

root@nk8s1:~# kubelet --version
Kubernetes v1.32.2

...i noticed one crun process constantly consumig 100% CPU. ptrace revealed that it is spinning in loop trying to write to STDOUT:

epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(7, [{events=EPOLLIN, data=0x4}], 10, -1) = 1
writev(1, [{iov_base="8\33[53d\t ", iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)

Investigating further I found this hierarchy of processes (1567488 is the CPU consuming crun):

root@nk8s1:~# pstree -sp 1567488
systemd(1)───crio(2109)───crun(1567488)───bash(1567490)───watch(1567662)

I had to kill -KILL 1567490 (the bash process) for the whole hierarchy to go away.

What I did to make this happen is the following:

  • I kubectl exec -it into a container where a ran watch <some command> to constantly watch the output of some command. The the networking from my laptop (VPN connection) broke and the kubectl process hung. I terminated it on the client. The interactive bash in the container continued to run though, with crun trying to write it's STDOUT but instead spinning in the loop.

Should crun detect that the connection has been closed and kill the command itself?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions