Skip to content

Commit 81f2331

Browse files
Bill Nellmemfrob
authored andcommitted
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling.
Summary: Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites. The basic process is the following: 1. Work backwards from the callsite to find the most recent def of the call register. 2. Work back from the call register def to find the instruction where the vtable is loaded. 3. Find out of there is any value profiling data associated with the vtable load. If so, record all these addresses as potential vtables + method offsets. 4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address. At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register. The result of this execution should be the method offset. 5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset. Make sure that this address maps to an actual function symbol. 6. Try to associate a vtable pointer with each target address in SymTargets. If every target has a vtable, then this is almost certainly a virtual method callsite. 7. Use the vtable address when generating the promoted call code. It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer. Additionally, the instructions to load up the method are dumped into the cold call block. For jump tables, the basic idea is the same. I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table. Note: I'm assuming the whole call is in a single BB. According to @rafaelauler, this isn't always the case on ARM. This also isn't always the case on X86 either. If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call. I'm going to leave fixing this until later since it makes things a bit more complicated. I've also fixed a bug where ICP was introducing a conditional tail call. I made sure that SCTC fixes these up afterwards. I have no idea why I made it introduce a CTC in the first place. (cherry picked from FBD6120768)
1 parent c588b5e commit 81f2331

13 files changed

+1000
-162
lines changed

bolt/BinaryBasicBlock.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -642,6 +642,14 @@ class BinaryBasicBlock {
642642
return Instructions.erase(II);
643643
}
644644

645+
/// Erase instructions in the specified range.
646+
template <typename ItrType>
647+
void eraseInstructions(ItrType Begin, ItrType End) {
648+
while (End > Begin) {
649+
eraseInstruction(*--End);
650+
}
651+
}
652+
645653
/// Erase all instructions
646654
void clear() {
647655
Instructions.clear();

bolt/BinaryContext.cpp

Lines changed: 89 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,28 @@ PrintDebugInfo("print-debug-info",
3636
cl::Hidden,
3737
cl::cat(BoltCategory));
3838

39+
static cl::opt<bool>
40+
PrintRelocations("print-relocations",
41+
cl::desc("print relocations when printing functions"),
42+
cl::Hidden,
43+
cl::cat(BoltCategory));
44+
45+
static cl::opt<bool>
46+
PrintMemData("print-mem-data",
47+
cl::desc("print memory data annotations when printing functions"),
48+
cl::Hidden,
49+
cl::cat(BoltCategory));
50+
3951
} // namespace opts
4052

53+
namespace llvm {
54+
namespace bolt {
55+
extern void check_error(std::error_code EC, StringRef Message);
56+
}
57+
}
58+
59+
Triple::ArchType Relocation::Arch;
60+
4161
BinaryContext::~BinaryContext() { }
4262

4363
MCObjectWriter *BinaryContext::createObjectWriter(raw_pwrite_stream &OS) {
@@ -326,7 +346,9 @@ void BinaryContext::printInstruction(raw_ostream &OS,
326346
const MCInst &Instruction,
327347
uint64_t Offset,
328348
const BinaryFunction* Function,
329-
bool printMCInst) const {
349+
bool PrintMCInst,
350+
bool PrintMemData,
351+
bool PrintRelocations) const {
330352
if (MIA->isEHLabel(Instruction)) {
331353
OS << " EH_LABEL: " << *MIA->getTargetSymbol(Instruction) << '\n';
332354
return;
@@ -392,24 +414,58 @@ void BinaryContext::printInstruction(raw_ostream &OS,
392414
}
393415
}
394416

395-
auto *MD = Function ? DR.getFuncMemData(Function->getNames()) : nullptr;
396-
if (MD) {
397-
bool DidPrint = false;
398-
for (auto &MI : MD->getMemInfoRange(Offset)) {
399-
OS << (DidPrint ? ", " : " # Loads: ");
400-
OS << MI.Addr << "/" << MI.Count;
401-
DidPrint = true;
417+
if ((opts::PrintMemData || PrintMemData) && Function) {
418+
const auto *MD = Function->getMemData();
419+
const auto MemDataOffset =
420+
MIA->tryGetAnnotationAs<uint64_t>(Instruction, "MemDataOffset");
421+
if (MD && MemDataOffset) {
422+
bool DidPrint = false;
423+
for (auto &MI : MD->getMemInfoRange(MemDataOffset.get())) {
424+
OS << (DidPrint ? ", " : " # Loads: ");
425+
OS << MI.Addr << "/" << MI.Count;
426+
DidPrint = true;
427+
}
402428
}
403429
}
404430

431+
if ((opts::PrintRelocations || PrintRelocations) && Function) {
432+
const auto Size = computeCodeSize(&Instruction, &Instruction + 1);
433+
Function->printRelocations(OS, Offset, Size);
434+
}
435+
405436
OS << "\n";
406437

407-
if (printMCInst) {
438+
if (PrintMCInst) {
408439
Instruction.dump_pretty(OS, InstPrinter.get());
409440
OS << "\n";
410441
}
411442
}
412443

444+
ErrorOr<ArrayRef<uint8_t>>
445+
BinaryContext::getFunctionData(const BinaryFunction &Function) const {
446+
auto Section = Function.getSection();
447+
assert(Section.getAddress() <= Function.getAddress() &&
448+
Section.getAddress() + Section.getSize()
449+
>= Function.getAddress() + Function.getSize() &&
450+
"wrong section for function");
451+
452+
if (!Section.isText() || Section.isVirtual() || !Section.getSize()) {
453+
return std::make_error_code(std::errc::bad_address);
454+
}
455+
456+
StringRef SectionContents;
457+
check_error(Section.getContents(SectionContents),
458+
"cannot get section contents");
459+
460+
assert(SectionContents.size() == Section.getSize() &&
461+
"section size mismatch");
462+
463+
// Function offset from the section start.
464+
auto FunctionOffset = Function.getAddress() - Section.getAddress();
465+
auto *Bytes = reinterpret_cast<const uint8_t *>(SectionContents.data());
466+
return ArrayRef<uint8_t>(Bytes + FunctionOffset, Function.getSize());
467+
}
468+
413469
ErrorOr<SectionRef> BinaryContext::getSectionForAddress(uint64_t Address) const{
414470
auto SI = AllocatableSections.upper_bound(Address);
415471
if (SI != AllocatableSections.begin()) {
@@ -640,3 +696,27 @@ size_t Relocation::emit(MCStreamer *Streamer) const {
640696
}
641697
return Size;
642698
}
699+
700+
#define ELF_RELOC(name, value) #name,
701+
702+
void Relocation::print(raw_ostream &OS) const {
703+
static const char *X86RelocNames[] = {
704+
#include "llvm/Support/ELFRelocs/x86_64.def"
705+
};
706+
static const char *AArch64RelocNames[] = {
707+
#include "llvm/Support/ELFRelocs/AArch64.def"
708+
};
709+
if (Arch == Triple::aarch64)
710+
OS << AArch64RelocNames[Type];
711+
else
712+
OS << X86RelocNames[Type];
713+
OS << ", 0x" << Twine::utohexstr(Offset);
714+
if (Symbol) {
715+
OS << ", " << Symbol->getName();
716+
}
717+
if (int64_t(Addend) < 0)
718+
OS << ", -0x" << Twine::utohexstr(-int64_t(Addend));
719+
else
720+
OS << ", 0x" << Twine::utohexstr(Addend);
721+
OS << ", 0x" << Twine::utohexstr(Value);
722+
}

bolt/BinaryContext.h

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ class DataReader;
5555

5656
/// Relocation class.
5757
struct Relocation {
58+
static Triple::ArchType Arch; /// for printing, set by BinaryContext ctor.
5859
uint64_t Offset;
5960
mutable MCSymbol *Symbol; /// mutable to allow modification by emitter.
6061
uint64_t Type;
@@ -78,13 +79,21 @@ struct Relocation {
7879
/// Emit relocation at a current \p Streamer' position. The caller is
7980
/// responsible for setting the position correctly.
8081
size_t emit(MCStreamer *Streamer) const;
82+
83+
/// Print a relocation to \p OS.
84+
void print(raw_ostream &OS) const;
8185
};
8286

8387
/// Relocation ordering by offset.
8488
inline bool operator<(const Relocation &A, const Relocation &B) {
8589
return A.Offset < B.Offset;
8690
}
8791

92+
inline raw_ostream &operator<<(raw_ostream &OS, const Relocation &Rel) {
93+
Rel.print(OS);
94+
return OS;
95+
}
96+
8897
class BinaryContext {
8998

9099
BinaryContext() = delete;
@@ -199,7 +208,9 @@ class BinaryContext {
199208
MIA(std::move(MIA)),
200209
MRI(std::move(MRI)),
201210
DisAsm(std::move(DisAsm)),
202-
DR(DR) {}
211+
DR(DR) {
212+
Relocation::Arch = this->TheTriple->getArch();
213+
}
203214

204215
~BinaryContext();
205216

@@ -215,13 +226,26 @@ class BinaryContext {
215226
/// global symbol was registered at the location.
216227
MCSymbol *getGlobalSymbolAtAddress(uint64_t Address) const;
217228

229+
/// Find the address of the global symbol with the given \p Name.
230+
/// return an error if no such symbol exists.
231+
ErrorOr<uint64_t> getAddressForGlobalSymbol(StringRef Name) const {
232+
auto Itr = GlobalSymbols.find(Name);
233+
if (Itr != GlobalSymbols.end())
234+
return Itr->second;
235+
return std::make_error_code(std::errc::bad_address);
236+
}
237+
218238
/// Return MCSymbol for the given \p Name or nullptr if no
219239
/// global symbol with that name exists.
220240
MCSymbol *getGlobalSymbolByName(const std::string &Name) const;
221241

222242
/// Print the global symbol table.
223243
void printGlobalSymbols(raw_ostream& OS) const;
224244

245+
/// Get the raw bytes for a given function.
246+
ErrorOr<ArrayRef<uint8_t>>
247+
getFunctionData(const BinaryFunction &Function) const;
248+
225249
/// Return (allocatable) section containing the given \p Address.
226250
ErrorOr<SectionRef> getSectionForAddress(uint64_t Address) const;
227251

@@ -340,7 +364,9 @@ class BinaryContext {
340364
const MCInst &Instruction,
341365
uint64_t Offset = 0,
342366
const BinaryFunction *Function = nullptr,
343-
bool printMCInst = false) const;
367+
bool PrintMCInst = false,
368+
bool PrintMemData = false,
369+
bool PrintRelocations = false) const;
344370

345371
/// Print a range of instructions.
346372
template <typename Itr>
@@ -349,9 +375,12 @@ class BinaryContext {
349375
Itr End,
350376
uint64_t Offset = 0,
351377
const BinaryFunction *Function = nullptr,
352-
bool printMCInst = false) const {
378+
bool PrintMCInst = false,
379+
bool PrintMemData = false,
380+
bool PrintRelocations = false) const {
353381
while (Begin != End) {
354-
printInstruction(OS, *Begin, Offset, Function, printMCInst);
382+
printInstruction(OS, *Begin, Offset, Function, PrintMCInst,
383+
PrintMemData, PrintRelocations);
355384
Offset += computeCodeSize(Begin, Begin + 1);
356385
++Begin;
357386
}

bolt/BinaryFunction.cpp

Lines changed: 78 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
#include "llvm/Support/Debug.h"
3131
#include "llvm/Support/GraphWriter.h"
3232
#include "llvm/Support/raw_ostream.h"
33+
#include "llvm/Support/Regex.h"
3334
#include <limits>
3435
#include <queue>
3536
#include <string>
@@ -137,8 +138,16 @@ PrintOnly("print-only",
137138
cl::Hidden,
138139
cl::cat(BoltCategory));
139140

141+
static cl::list<std::string>
142+
PrintOnlyRegex("print-only-regex",
143+
cl::CommaSeparated,
144+
cl::desc("list of function regexes to print"),
145+
cl::value_desc("func1,func2,func3,..."),
146+
cl::Hidden,
147+
cl::cat(BoltCategory));
148+
140149
bool shouldPrint(const BinaryFunction &Function) {
141-
if (PrintOnly.empty())
150+
if (PrintOnly.empty() && PrintOnlyRegex.empty())
142151
return true;
143152

144153
for (auto &Name : opts::PrintOnly) {
@@ -147,6 +156,12 @@ bool shouldPrint(const BinaryFunction &Function) {
147156
}
148157
}
149158

159+
for (auto &Name : opts::PrintOnlyRegex) {
160+
if (Function.hasNameRegex(Name)) {
161+
return true;
162+
}
163+
}
164+
150165
return false;
151166
}
152167

@@ -160,6 +175,11 @@ constexpr unsigned BinaryFunction::MinAlign;
160175

161176
namespace {
162177

178+
template <typename R>
179+
bool emptyRange(const R &Range) {
180+
return Range.begin() == Range.end();
181+
}
182+
163183
/// Gets debug line information for the instruction located at the given
164184
/// address in the original binary. The SMLoc's pointer is used
165185
/// to point to this information, which is represented by a
@@ -227,6 +247,14 @@ bool DynoStats::lessThan(const DynoStats &Other,
227247

228248
uint64_t BinaryFunction::Count = 0;
229249

250+
bool BinaryFunction::hasNameRegex(const std::string &NameRegex) const {
251+
Regex MatchName(NameRegex);
252+
for (auto &Name : Names)
253+
if (MatchName.match(Name))
254+
return true;
255+
return false;
256+
}
257+
230258
BinaryBasicBlock *
231259
BinaryFunction::getBasicBlockContainingOffset(uint64_t Offset) {
232260
if (Offset > Size)
@@ -558,6 +586,31 @@ void BinaryFunction::print(raw_ostream &OS, std::string Annotation,
558586
OS << "End of Function \"" << *this << "\"\n\n";
559587
}
560588

589+
void BinaryFunction::printRelocations(raw_ostream &OS,
590+
uint64_t Offset,
591+
uint64_t Size) const {
592+
const char* Sep = " # Relocs: ";
593+
594+
auto RI = Relocations.lower_bound(Offset);
595+
while (RI != Relocations.end() && RI->first < Offset + Size) {
596+
OS << Sep << "(R: " << RI->second << ")";
597+
Sep = ", ";
598+
++RI;
599+
}
600+
601+
RI = MoveRelocations.lower_bound(Offset);
602+
while (RI != MoveRelocations.end() && RI->first < Offset + Size) {
603+
OS << Sep << "(M: " << RI->second << ")";
604+
Sep = ", ";
605+
++RI;
606+
}
607+
608+
auto PI = PCRelativeRelocationOffsets.lower_bound(Offset);
609+
if (PI != PCRelativeRelocationOffsets.end() && *PI < Offset + Size) {
610+
OS << Sep << "(pcrel)";
611+
}
612+
}
613+
561614
IndirectBranchType BinaryFunction::processIndirectBranch(MCInst &Instruction,
562615
unsigned Size,
563616
uint64_t Offset) {
@@ -566,7 +619,7 @@ IndirectBranchType BinaryFunction::processIndirectBranch(MCInst &Instruction,
566619
// An instruction referencing memory used by jump instruction (directly or
567620
// via register). This location could be an array of function pointers
568621
// in case of indirect tail call, or a jump table.
569-
const MCInst *MemLocInstr;
622+
MCInst *MemLocInstr;
570623

571624
// Address of the table referenced by MemLocInstr. Could be either an
572625
// array of function pointers, or a jump table.
@@ -834,6 +887,8 @@ void BinaryFunction::disassemble(ArrayRef<uint8_t> FunctionData) {
834887

835888
DWARFUnitLineTable ULT = getDWARFUnitLineTable();
836889

890+
matchProfileMemData();
891+
837892
// Insert a label at the beginning of the function. This will be our first
838893
// basic block.
839894
Labels[0] = Ctx->createTempSymbol("BB0", false);
@@ -1181,6 +1236,10 @@ void BinaryFunction::disassemble(ArrayRef<uint8_t> FunctionData) {
11811236
findDebugLineInformationForInstructionAt(AbsoluteInstrAddr, ULT));
11821237
}
11831238

1239+
if (MemData && !emptyRange(MemData->getMemInfoRange(Offset))) {
1240+
MIA->addAnnotation(Ctx.get(), Instruction, "MemDataOffset", Offset);
1241+
}
1242+
11841243
addInstruction(Offset, std::move(Instruction));
11851244
}
11861245

@@ -1892,6 +1951,23 @@ bool BinaryFunction::fetchProfileForOtherEntryPoints() {
18921951
return Updated;
18931952
}
18941953

1954+
void BinaryFunction::matchProfileMemData() {
1955+
const auto AllMemData = BC.DR.getFuncMemDataRegex(getNames());
1956+
for (auto *NewMemData : AllMemData) {
1957+
// Prevent functions from sharing the same profile.
1958+
if (NewMemData->Used)
1959+
continue;
1960+
1961+
if (MemData)
1962+
MemData->Used = false;
1963+
1964+
// Update function profile data with the new set.
1965+
MemData = NewMemData;
1966+
MemData->Used = true;
1967+
break;
1968+
}
1969+
}
1970+
18951971
void BinaryFunction::matchProfileData() {
18961972
// This functionality is available for LBR-mode only
18971973
// TODO: Implement evaluateProfileData() for samples, checking whether

0 commit comments

Comments
 (0)