-
Notifications
You must be signed in to change notification settings - Fork 3
Auto tuning combined bandwidth
Hüseyin Tuğrul BÜYÜKIŞIK edited this page Feb 8, 2021
·
4 revisions
The "PcieBandwidthBenchmarker.h"
header includes PcieBandwidthBenchmarker class that benchmarks each physical card in system and finds their relative multiplier constants to maximize combined bandwidth for virtual array usage.
Usage:
// user allows it to use 128MB per card during benchmark
PcieBandwidthBenchmarker bench(128);
// user picks minimum allowed data channel(virtual gpu) per physical gpu
// example: 2 here
std::vector<int> multipliers = bench.bestBandwidth(2);
Output array on development machine becomes {3,4,2}
because slowest connection is on 3rd pcie bridge. Rest are scaled with their own data copying performances.
std::vector<int> multipliers = bench.bestBandwidth(10); // multipliers = { 15,20,10}
Then it can be directly sent to constructor of VirtualMultiArray like this:
VirtualMultiArray<Obj> data(..,..,..,..,bench.bestBandwidth(2));
The data array uses that specified ratios of bandwidths which maps well to physical card communication performance under high-enough concurrent accesses to elements.