-
Notifications
You must be signed in to change notification settings - Fork 166
Description
.. at least for tmpfs, I have not checked what happens with zfs.
per-builder fs structure is perfectly fine and encouraged, it's just the way it is currently done which is the problem.
The commonly executing binaries are literally megabytes in size. With compiles running in parallel this keeps busting CPU caches. This would not be happening if the underlying vnodes were the same.
Important remark about NUMA-awareness goes here. This is a factor on the official builders as well.
The core idea how to handle this is shared tmpfs mount with different jails on it, with files hardlinked.
An unsuspecting user might think nullfs would do a great job here, but that's not true due to its overhead.
Suppose the system has 2 numa nodes (numbered 0 and 1) and builders are hanging out in /poudriere/builders
.
In that case:
cpuset -l <cpus-from-domain-0> mount -t tmpfs tmpfs /poudriere/builders/node0
cpuset -l <cpus-from-domain-0> mkdir /poudriere/builders/node0/basefs
here unpack the base system into the basefs
dir. Still ignore /usr/share
, /usr/tests
and whatever other applicable. these can be null-mounted from a machine-wide place.
cpuset -l <cpus-from-domain-0> mkdir /poudriere/builders/node0/builder0
here recreate the directory tree as seen in basefs
with mkdir of everything
now create hardlinks to all files (e.g., ln basefs/bin/sh builder0/bin/sh
)
symlinks have to be recreated as they are found in basefs (e.g., a symlink to "../crap" has to be a symlink to "../crap")
repeat for all domains and all builders.
Finally, issue +schg on all files in basefs and all directories (modulo etc and similar) in builders to prevent ports from messing with them.
I stress cpuset to maintain kernel memory locality when creating data structures to use by given builder.
Et voila, jail-private view of the filesystem with kernel-level data being shared.
In a simple test building hello world in a loop with clang I get over 6% win from doing this instead of completely separate worlds. The win would be higher if it was not for lock contention in the kernel, which I'm going to look into.
Here are memory throughput stats from pcm when issuing 54 builds in parallel.
54 separate tmpfs mounts:
|-- System Read Throughput(MB/s): 36412.07 --|
|-- System Write Throughput(MB/s): 15735.59 --|
|-- System Memory Throughput(MB/s): 52147.66 --|
Shared:
|-- System Read Throughput(MB/s): 10309.42 --|
|-- System Write Throughput(MB/s): 12702.96 --|
|-- System Memory Throughput(MB/s): 23012.38 --|
As you can see read throughput dropped over 3x, and that's while getting more work done.