Skip to content
This repository was archived by the owner on Dec 9, 2021. It is now read-only.
This repository was archived by the owner on Dec 9, 2021. It is now read-only.

Signals + subprocesses may lead to livelock #4

@mgedmin

Description

@mgedmin

When you're profiling a view that uses subprocess.Popen(), it might livelock:

  • the clone() syscall (used to implement os.fork() under the hood) may take 2-3 milliseconds on my machine (as measured by strace -T while the flamegraph was off)
  • a SIGALRM arrives every 1 ms and interrupts the clone()
  • clone() returns -EINTR
  • the signal handler runs
  • clone() is then restarted
  • go to step 1

This is very clearly visible in strace output. Externally this is visible as a Django process eating 100% CPU and sometimes not making any progress.

Sometimes the view would actually finish, after an extra 10-20 seconds of this processing, which shows in the flamegraph as a call stack terminating in subprocess.Popen._wait().

I don't know if this can be fixed.

Possible mitigations: don't use subprocess.Popen() from the main thread, use a lower sampling frequency for the SIGALRM timer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions