Skip to content

Passenger creates an unnecessary amount of threads on systems with many CPUs #275

@CSC-swesters

Description

@CSC-swesters

Passenger will spawn worker threads in each per-user-nginx (PUN) instance based on the amount of cores on the system where OOD runs. This behavior probably makes sense when running only a single web server instance. However, when OOD is launching one Nginx instance per user, and an OOD deployment has hundreds of users, there starts to be a lot of threads.

There are a few issues with having this many threads. Passenger will monitor the CPU and memory usage of all these threads by running ps(1) regularly, and this creates a high system load on the OOD host. We have seen load averages spike into the thousands on a regular basis, which we suspect Passenger is to blame for. Using the perf(1) tool at the tail end of such a load spike, we could see that PassengerAgent is the culprit behind a lot of context switches.

[root@host ~]# perf record -e context-switches -c 1 -a
^C

[root@host ~]# perf report
Samples: 409K of event 'context-switches', Event count (approx.): 409815
Overhead  Command          Shared Object      Symbol
  26,68%  PassengerAgent   [kernel.kallsyms]  [k] schedule
...

For context, one of our clusters runs OOD on a 12 core VM, with several hundreds of users. At the point of writing, there were 722 PUNs, 1461 PassengerAgent processes, and 26754 PassengerAgent threads.

[root@host ~]# ps aux | grep "nginx" | grep -c "master"
722
[root@host ~]# pstree -np > pstree.txt
[root@host ~]# grep -cF -e "-PassengerAgent(" pstree.txt
1461
[root@host ~]# grep -cF -e "-{PassengerAgent}(" pstree.txt
26754

We investigated the issue and found that Passenger will ultimately use the Boost thread::hardware_concurrency() function (link to source code) to decide what amount of threads to spawn for its agent.

It's possible that Passenger has some knobs for tuning its parallelism, but we suspect they are under the Passenger Enterprise offering. Luckily, the Passenger source code is MIT licensed, so we are able to patch it to better suit OOD's architecture.

We were able to patch this via the ondemand-passenger rpmbuild spec, to accept an environment variable which overrides the default behavior when defined and containing a valid, positive integer. If you are interested in upstreaming this patch, we will gladly open a pull request.

With this patch, and the override set to 2 (instead of 12), we're reducing the amount of threads per PUN by 20. The benchmark we used was to restart the instance, and then log into the dashboard. The pstree -u username outputs are attached below:

# Before (12 threads)
PassengerAgent─┬─PassengerAgent─┬─ruby───2*[{ruby}]
               │                └─33*[{PassengerAgent}]
               └─4*[{PassengerAgent}]
# After (override as 2 threads)
PassengerAgent─┬─PassengerAgent─┬─ruby───2*[{ruby}]
               │                └─13*[{PassengerAgent}]
               └─4*[{PassengerAgent}]

On systems with even more CPU threads available, the improvement is naturally even greater. Running OOD on a host with 256 hardware threads will result in a pstree that looks like this:

# 256 threads
PassengerAgent─┬─PassengerAgent─┬─ruby───2*[{ruby}]
               │                └─525*[{PassengerAgent}]
               └─4*[{PassengerAgent}]

I.e., the amount of reduced threads per PUN could be 508 with this patch and override to 2 threads hardware_concurrency 🙂

Let us know what you think!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions