Skip to content

Auto tuning combined bandwidth

Hüseyin Tuğrul BÜYÜKIŞIK edited this page Feb 8, 2021 · 4 revisions

The "PcieBandwidthBenchmarker.h" header includes PcieBandwidthBenchmarker class that benchmarks each physical card in system and finds their relative multiplier constants to maximize combined bandwidth for virtual array usage.

Usage:

// user allows it to use 128MB per card during benchmark
PcieBandwidthBenchmarker bench(128);

// user picks minimum allowed data channel(virtual gpu) per physical gpu
// example: 2 here
std::vector<int> multipliers = bench.bestBandwidth(2);

Output array on development machine becomes {3,4,2} because slowest connection is on 3rd pcie bridge. Rest are scaled with their own data copying performances.

std::vector<int> multipliers = bench.bestBandwidth(10); // multipliers = { 15,20,10}

Then it can be directly sent to constructor of VirtualMultiArray like this:

VirtualMultiArray<Obj> data(..,..,..,..,bench.bestBandwidth(2));

The data array uses that specified ratios of bandwidths which maps well to physical card communication performance under high-enough concurrent accesses to elements.

Clone this wiki locally