Skip to content

spare arbiters memory consumption #1506

@dkanbier

Description

@dkanbier

When running in a high availability setup with a spare arbiter, it consumes a lot of memory and keeps consuming more for every restart you do on the primary arbiter.

In this graph we see the memory usage of the arbiter processes. The yellow/green lines are for the main arbiter, the orange/blue ones for the spare arbiter:

screen shot 2015-02-10 at 11 56 40

Every jump in memory by the spare arbiter is caused by a reload of the master arbiter, which causes it to send the new configuration to the spare arbiter. The drops in memory usage are we restarting the spare arbiter to free up memory.

While testing I discovered there is actually a maximum number of jumps the spare arbiter takes, it's equal to the number of http threads (8 is the default) the spare arbiter spawns at startup. Each time the master arbiter sends it's new configuration to the spare arbiter it is handled by a different http thread (round-robin). Once all http threads have handled a POST request from the master, memory consumption becomes stable. No more jumps.

Now I'm not a Python programmer, but here is what I think happens:

The cPickle.loads command in the put_conf method of IForArbiter uses the amount of memory we see per jump in the graph (let's say 1GB). Now because the http thread called this method it has a pointer to this data and as long as the thread exists the data can't be removed by the garbage collector.

Since every POST is handled by a different http thread, you can have potentially 8 (number of threads) * 1GB (size needed for cPickle method on large conf) = 8 GB in use.

Once all threads have been used they get reused, releasing the pointer and reusing the claimed memory from the previous run. This causes the memory usage to stabilise.

To counter this issue I've moved the cPickle.loads command from the put_conf method in IForArbiter to the setup_new_conf method in ArbiterDaemon. So put_conf in IForArbiter only passes it's data to the ArbiterDaemon object and the ArbiterDaemon is now responsible for calling cPickle.loads on the data.

This way I think the http thread has no pointer to the cPickle claimed memory. The cPickle now happens in the ArbiterDaemon object to which the http threads have no connection. I've done some basic testing and this seems to have decreased memory consumption for quite a bit, getting rid of the high jumps we see in the graph.

I'll need to do some more tests, but I'd like to hear what you think. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions