Skip to content

Commit 4984714

Browse files
authored
[BOLT] Fix density for jump-through functions (#145619)
Address the issue that stems from how the density is computed. Binary *function* density is the ratio of its total dynamic number of executed bytes over the static size in bytes. The meaning of it is the amount of dynamic profile information relative to its static size. Binary *profile* density is the minimum *function* density among *well- -profiled* functions, taken as functions covering p99 samples, or, in other words, excluding functions in the tail 1% of samples. p99 is an arbitrary cutoff. The meaning of profile density is the *minimum amount of profile information per function* to be able to optimize the program well. The threshold for profile density is set empirically. The dynamically executed bytes are taken directly from LBR fall-throughs and for LBRs recorded in trampoline functions, such as ``` 000000001a941ec0 <Sleef_expf8_u10>: 1a941ec0: jmpq *0x37b911fa(%rip) # <pnt_expf8_u10> 1a941ec6: nopw %cs:(%rax,%rax) ``` the fall-through has zero length: ``` # Branch Target NextBranch Count T 1b171cf6 1a941ec0 1a941ec0 568562 ``` But it's not correct to say this function has zero executed bytes, just the size of the next branch is not included in the fall-through. If such functions have non-trivial sample count, they will fall in p99 samples, and cause the profile density to be zero. To solve this, we can either: 1. Include fall-through end jump size into executed bytes: is logically sound but technically challenging: the size needs to come from disassembly (expensive), and the threshold need to be reevaluated with updated definition of binary function density. 2. Exclude pass-through functions from density computation: follows the intent of profile density which is to set the amount of profile information needed to optimize the function well. Single instruction pass-through functions don't need samples many times the size to be optimized well. Go with option 2 as a reasonable compromise. Test Plan: added bolt/test/X86/zero-density.s
1 parent 4cb8308 commit 4984714

File tree

2 files changed

+48
-17
lines changed

2 files changed

+48
-17
lines changed

bolt/lib/Passes/BinaryPasses.cpp

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1765,27 +1765,26 @@ Error PrintProgramStats::runOnFunctions(BinaryContext &BC) {
17651765

17661766
if (opts::ShowDensity) {
17671767
double Density = 0.0;
1768-
// Sorted by the density in descending order.
1769-
llvm::stable_sort(FuncDensityList,
1770-
[&](const std::pair<double, uint64_t> &A,
1771-
const std::pair<double, uint64_t> &B) {
1772-
if (A.first != B.first)
1773-
return A.first > B.first;
1774-
return A.second < B.second;
1775-
});
1768+
llvm::sort(FuncDensityList);
17761769

17771770
uint64_t AccumulatedSamples = 0;
1778-
uint32_t I = 0;
17791771
assert(opts::ProfileDensityCutOffHot <= 1000000 &&
17801772
"The cutoff value is greater than 1000000(100%)");
1781-
while (AccumulatedSamples <
1782-
TotalSampleCount *
1783-
static_cast<float>(opts::ProfileDensityCutOffHot) /
1784-
1000000 &&
1785-
I < FuncDensityList.size()) {
1786-
AccumulatedSamples += FuncDensityList[I].second;
1787-
Density = FuncDensityList[I].first;
1788-
I++;
1773+
// Subtract samples in zero-density functions (no fall-throughs) from
1774+
// TotalSampleCount (not used anywhere below).
1775+
for (const auto [CurDensity, CurSamples] : FuncDensityList) {
1776+
if (CurDensity != 0.0)
1777+
break;
1778+
TotalSampleCount -= CurSamples;
1779+
}
1780+
const uint64_t CutoffSampleCount =
1781+
1.f * TotalSampleCount * opts::ProfileDensityCutOffHot / 1000000;
1782+
// Process functions in decreasing density order
1783+
for (const auto [CurDensity, CurSamples] : llvm::reverse(FuncDensityList)) {
1784+
if (AccumulatedSamples >= CutoffSampleCount)
1785+
break;
1786+
AccumulatedSamples += CurSamples;
1787+
Density = CurDensity;
17891788
}
17901789
if (Density == 0.0) {
17911790
BC.errs() << "BOLT-WARNING: the output profile is empty or the "

bolt/test/X86/zero-density.s

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
## Check that trampoline functions are excluded from density computation.
2+
3+
# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown %s -o %t.o
4+
# RUN: ld.lld %t.o -o %t
5+
# RUN: link_fdata %s %t %t.preagg PREAGG
6+
# RUN: llvm-strip -NLjmp %t
7+
# RUN: perf2bolt %t -p %t.preagg --pa -o %t.fdata | FileCheck %s
8+
# CHECK: Functions with density >= {{.*}} account for 99.00% total sample counts.
9+
# CHECK-NOT: the output profile is empty or the --profile-density-cutoff-hot option is set too low.
10+
11+
.text
12+
.globl trampoline
13+
trampoline:
14+
mov main,%rax
15+
jmpq *%rax
16+
.size trampoline,.-trampoline
17+
# PREAGG: f #trampoline# #trampoline# 2
18+
19+
.globl main
20+
main:
21+
.cfi_startproc
22+
vmovaps %zmm31,%zmm3
23+
24+
add $0x4,%r9
25+
add $0x40,%r10
26+
dec %r14
27+
Ljmp:
28+
jne main
29+
# PREAGG: T #Ljmp# #main# #Ljmp# 10
30+
ret
31+
.cfi_endproc
32+
.size main,.-main

0 commit comments

Comments
 (0)