Skip to content

Large number of open files causes issues #130

@Champ-Goblem

Description

@Champ-Goblem

We have a number of issues caused by the number of files that this crate opens in the context of running in Nydus.

The first issue is with workloads that perform a large number of filesystem operations, the longer the pod runs the more file descriptors that get collected. Nydus sets the rlimit on the host, but for some systems, this is capped at 2^20 (1048576) and can't go above this value. We have seen this cause issues with a workload where it enters a state in which it is in a constant crash loop and is unable to recover unless the pod is deleted and recreated. The pod constantly complains about OSError: [Errno 24] Too many open files yet the actual workload inside the VM is not reaching the descriptor limit.

When inspecting the nr-open count and comparing this to the ulimit within the Linux namespace for the pod on the host node, we see that nr-open is maxed out at the ulimit value. The majority of these files are currently in an open state under the Nydus process.

Having so many files in a constant open state also causes the kubelet CPU usage to increase drastically. This is because the kubelet runs cadvisor which collects metrics on the open file descriptors and the type of file descriptor (eg if it's a socket or a file). We recently opened an issue with cadvisor about this metric stat collection, which can be found here (google/cadvisor#3233), but it would be good to try and solve the issue at the source.

I assume the reason the open file descriptor is “cached” is so that the overhead of executing the open syscall is reduced? If this is the case, is there a way to automatically close a file descriptor if it's not used often? Something like a timeout on the descriptor so if it's not been accessed after x amount of time it gets closed, this allows it to be reopened when it's needed again.

Any thoughts or ideas would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions